The laboratory dataset is one of the core safety datasets and, at first glance, it could appear intimidating, with multiple tests and visits per patient. This article will illustrate checks that are worth applying at the very beginning of programming work – these could be in addition to the standardized process of domain validation.
Early identification of laboratory issues will allow you to:
The laboratory domain (LB) captures laboratory data collected in the case report form (CRF) or received from a central provider or vendor. Very often there are only two values to be summarized per subject and parameter: baseline and maximum on treatment value. For example values outside the normal ranges and toxicities, as defined per National Cancer Institute Common Terminology Criteria for Adverse Events (NCI CTCAE), may also be summarized. There are also other toxicities grading systems including Rheumatology Common Toxicity Criteria (RCTC) and Division of Aids (DAIDS) to name a few that could be considered. Sometimes laboratory test values are summarized per treatment cycle or epoch, more complicated outputs include:
Before the statistical programming commences , there are few required actions to perform:
See table below for an example of specimen exclusions for potassium:
In addition to the study requirements discussed above, the following mapping checks are suggested:
Note: The example above is using CDISC Mapping.
Not all tests will have continuous numeric results, examples may be urinalysis tests like ketones, glucose, and protein. Qualitative values can be mapped as follows:
>60 → 60.01
Simple mapping of collected terms to standardized terms can be done in the SDTM datasets, however this does not include imputations such as >6 → 60.01 which should be done in the analysis ADaM dataset instead as no imputations should be done in the SDTMs. Care should be taken to follow the CDISC Implementation Guides for how to populate the Original, Standard and Analysis results across the datasets. If there are no units associated with a result it does not necessarily mean that it is qualitative. Example tests with quantitative results and no unit are pH or Specific Gravity.
There is the possibility to choose from many unit standards, which one to follow is up to your clinical team. Some example standards are the Système International (SI) unit, the U.S. Conventional units or Client-specific standard. See table below for examples of the differences between units:
It is strictly recommended to check that all units follow the agreed standard, if not, unit conversion should be applied. The unit 10**6/MM*3 for CDISC STDM should be mapped to CDISC Controlled Terminology equivalent. This is a CDISC SDTM requirement and all units should be mapped to the equivalent CDISC CT Submission Value where possible. Conversion of units is done by multiplying the original lab test value by the specified conversion factor. The clinical team will provide you with missing conversion factors; these would also be needed to convert lower and upper limits of the reference range where populated, unless new ranges are being applied for consistency across the dataset. Conversion may depend on the laboratory test – see tables below:
In addition, the checking of outliers in the converted observations is recommended as sometimes the initial unit is incorrectly assigned and the conversion was not in fact needed.
Conversion of units may be time-consuming, especially if each non-standard unit is handled separately. This would also significantly increase the size of the SAS program and make the code cumbersome to read. It is therefore worth considering writing a reusable program for unit conversion.
The initial step for the conversion process would be making sure that the laboratory dataset contains the standard unit for each test, i.e., the unit that will be used for the reporting; exceptions are laboratory tests for which units are not required. It is good practice to keep a list of reporting units in an external file (.txt, .csv, .xls), which in this form can be easily read in SAS and transformed to a SAS dataset; moreover, all potential updates will require only an update to the external file and rerun of a previously created code.
If a dataset of conversion factors is not provided to the programming team, it can be created by programmers and submitted for clinical review and approval. For conversions which are not dependent on the test (for example g/dL to g/L), the dataset with conversion factors should contain at a minimum: original unit, conversion factor and reporting unit. For conversions specific to the laboratory tests the dataset should additionally contain variable(s) allowing identification of the test, such as the Lab Test Code or the Lab Test Name. The final dataset should contain unique records only, in order to avoid duplication of laboratory records.
The below macro call can be used for merging the laboratory dataset with the dataset containing reporting units:
/* Macro for merging laboratory dataset (in_ds) with dataset containing reporting units (unit_ds) for each test.
Datasets are merged by common variables (byvars) which:
- are specific to the project,
- identify unique lab test.
Example merge key can be:
- Lab Test Name and Specimen Type, Lab Test Code, LOINC code.
‘all’ variable specifies the content of the output dataset (out_ds):
- only records with reporting unit found in unit_ds dataset (all=N) or
- all records, irrespective of the corresponding reporting unit found in unit_ds dataset or not */
%macro std_units(in_ds=lb, byvars=lbtestcd, unit_ds=units, out_ds=lb_unit, all=Y);
proc sort data=&in_ds. out=&in_ds.s;
by &byvar.;
run;
proc sort data=&unit_ds. out=&unit_ds.s;
by &byvar.;
run;
data=&unit_ds. ;
merge &in_ds.s(in=a) &unit_ds.s(in=b);
by &byvars.;
%if &all.=N %then
%do;
if a and b;
%end;
%else
%do;
if a;
%end;
run;
%mend std_units;
conv - conversion dataset
factor - conversion factor variable
org_unit - original unit variable
rep_unit - reporting unit variable
lbtestcd - lab test code variable, in this example it is identifying variable for lab test. Lbtestcd is used to assign factors for conversion dependent on lab test (a.lbtestcd=b.lbtestcd); for conversion not dependent on lab test lbtestcd is missing in conv dataset. */
proc sql;
create table lb_conv as select a.*, b.factor
from lb as a left join conv as b
on upcase (a.org_unit)=upcase (b.org_unit)
and upcase (a.rep_unit)=upcase (b.rep_unit)
and (a.lbtestcd=b.lbtestcd or missing(b.lbtestcd))
order by a.lbtestcd;
quit;
/* conv_ln – macro parameter, specify if lower and upper limits of range should be converted (conv_ln=Y)
rep_unit – reporting unit
factor – conversion factor
convfl – flag for converted observations
standard CDISC variables:
lborres – result or finding in original units
lborresu – original units
lbsrtesu - standard units
lbstresn – numeric result/finding in standard units
lbstresc – character result/finding in standard format
lbornrlo – reference range lower limit in original unit
lbornrhi – reference range upper limit in original unit
lbstnrlo – reference range lower limit - standard units
lbstnrhi – reference range upper limit - standard units */
%macro conversion(conv_ln=Y);
%let conv_ln=%upcase l(&conv_ln);
if upcase(lborresu) ne upcase(rep_unit) and cmiss(factor,
lborres)=0 and findc(lborres, '.', 'dkt')=0 then
do;
lbstresu=rep_unit;
lbstresn=input(lborres, best.)*factor;
lbstresc=strip(put(lbstresn, best.));
%if &conv_lnest. = Y %then
%do;
lbstnrlo=input(lbornrlo, best.)*factor;
lbstnrhi=input(lbornrhi, best.)*factor;
%end;
convfl='Y';
end;
%mend conversion;
After conversion of units it is good practice to check the dataset for outliers as they may indicate data issues in the laboratory dataset. An example of a data issue is incorrect recording of prefix ‘micro’ in the unit and using incorrect symbol ‘m’ instead of ‘u’. Unit micromole per litre should be written as ‘umol/L’ however it may be wrongly assigned as ‘mmol/L’, resulting in 1000 times higher result than the actual value. Such cases should be reported to Clinical Data Management.
Common issues with data from local laboratories are:
Data obtained from a combination of central and local laboratories may also cause issues, for example:
While central laboratories ensure a standard approach and provide values in standardised units, it is not always the case with local laboratories. The macro below may be used to separate units concatenated to the value in the result variable (i.e. unit is included in the result variable and the unit variable is missing). Macro should be called within the data step for tests where numeric result is expected. It is assumed that:
/*condition - macro parameter, used to specify subset of tests for macro
sepfl – flag for separated observations
old_lborres - holds value of lborres before separation
lborres, lborresu – standard CDISC variables, see code above *
%macro separate (condition);
if &condition. then
do;
lborres_=strip(lborres);
if anyalpha(lborres)>0
and (anydigit(substr(lborres_,1 ,1 ))=1 or
substr(lborres_,1 ,1 )='.')
and missing(lborresu) then
do;
old_lborres=lborres;
lborresu = substr(lborres_, anyalpha(lborres_));
lborres = substr(lborres_, 1 , anyalpha(lborres_)-1 );
sepfl='Y';
end;
end;
drop lborres_;
%mend separate;
Example calls may be:
%separete (lbtestcd in ('ALT' 'CA' 'BILI' 'K'))
%separate (lbcat ne 'URINALYSIS')
The Study Data Tabulation Model (SDTM) baseline flag should be used on team consent; otherwise it may be necessary to ask for appropriate baseline definition. Baseline definition can be a specific visit or the last non missing result prior to first dose. While developing baseline algorithm, consider usage/ imputation of measurement time or time-points, imputation of missing/ incomplete dates and inclusion of unplanned visit. Finally, is important to clarify if subjects with no baseline are expected to be summarised in post-baseline or shift tables.
How can you develop the processing of laboratory data within your organisation?
Develop standard macros to handle repeated steps, for example: read in and check the specification with a CDISC validator, perform the test/units mapping, read in/create codelists, convert units and derive baseline.
Quanticate's statistical programming team can support you with Laboratory dataset, CDISC Mapping and SDTM conversions and domains. Submit a Request for Information and a member of Quanticate's Business Development team will be in touch with you shortly.
Related Blog Posts:
Bring your drugs to market with fast and reliable access to experts from one of the world’s largest global biometric Clinical Research Organizations.
© 2025 Quanticate