Creating ADSL

Introduction

This article describes creating an ADSL ADaM. Examples are currently presented and tested using DM, EX , AE, LB and DS SDTM domains. However, other domains could be used.

Note: All examples assume CDISC SDTM and/or ADaM format as input unless otherwise specified.

Programming Flow

Read in Data

To start, all data frames needed for the creation of ADSL should be read into the environment. This will be a company specific process. Some of the data frames needed may be DM, EX, DS, AE, and LB.

For example purpose, the CDISC Pilot SDTM datasets—which are included in {admiral.test}—are used.

library(admiral)
library(dplyr)
library(admiral.test)
library(lubridate)
library(stringr)

data("admiral_dm")
data("admiral_ds")
data("admiral_ex")
data("admiral_ae")
data("admiral_lb")

dm <- admiral_dm
ds <- admiral_ds
ex <- admiral_ex
ae <- admiral_ae
lb <- admiral_lb

The DM domain is used as the basis for ADSL:

adsl <- dm %>%
  select(-DOMAIN)
USUBJID RFSTDTC COUNTRY AGE SEX RACE ETHNIC ARM ACTARM
01-701-1015 2014-01-02 USA 63 F WHITE HISPANIC OR LATINO Placebo Placebo
01-701-1023 2012-08-05 USA 64 M WHITE HISPANIC OR LATINO Placebo Placebo
01-701-1028 2013-07-19 USA 71 M WHITE NOT HISPANIC OR LATINO Xanomeline High Dose Xanomeline High Dose
01-701-1033 2014-03-18 USA 74 M WHITE NOT HISPANIC OR LATINO Xanomeline Low Dose Xanomeline Low Dose
01-701-1034 2014-07-01 USA 77 F WHITE NOT HISPANIC OR LATINO Xanomeline High Dose Xanomeline High Dose
01-701-1047 2013-02-12 USA 85 F WHITE NOT HISPANIC OR LATINO Placebo Placebo
01-701-1057 USA 59 F WHITE HISPANIC OR LATINO Screen Failure Screen Failure
01-701-1097 2014-01-01 USA 68 M WHITE NOT HISPANIC OR LATINO Xanomeline Low Dose Xanomeline Low Dose
01-701-1111 2012-09-07 USA 81 F WHITE NOT HISPANIC OR LATINO Xanomeline Low Dose Xanomeline Low Dose
01-701-1115 2012-11-30 USA 84 M WHITE NOT HISPANIC OR LATINO Xanomeline Low Dose Xanomeline Low Dose

Derive Treatment Variables (TRT0xP, TRT0xA)

The mapping of the treatment variables is left to the ADaM programmer. An example mapping may be:

adsl <- dm %>%
  mutate(TRT01P = ARM, TRT01A = ACTARM)

Derive/Impute Numeric Treatment Date/Time and Duration (TRTSDTM, TRTEDTM, TRTDURD)

The function derive_vars_merged() can be used to derive the treatment start and end date/times using the ex domain. A pre-processing step for ex is required to convert the variable EXSTDTC and EXSTDTC to datetime variables and impute missing date or time components. Conversion and imputation is done by derive_vars_dtm().

Example calls:

# impute start and end time of exposure to first and last respectively, do not impute date
ex_ext <- ex %>%
  derive_vars_dtm(
    dtc = EXSTDTC,
    new_vars_prefix = "EXST"
  ) %>%
  derive_vars_dtm(
    dtc = EXENDTC,
    new_vars_prefix = "EXEN",
    time_imputation = "last"
  )

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXSTDTM),
    new_vars = vars(TRTSDTM = EXSTDTM, TRTSTMF = EXSTTMF),
    order = vars(EXSTDTM, EXSEQ),
    mode = "first",
    by_vars = vars(STUDYID, USUBJID)
  ) %>%
  derive_vars_merged(
    dataset_add = ex_ext,
    filter_add = (EXDOSE > 0 |
      (EXDOSE == 0 &
        str_detect(EXTRT, "PLACEBO"))) & !is.na(EXENDTM),
    new_vars = vars(TRTEDTM = EXENDTM, TRTETMF = EXENTMF),
    order = vars(EXENDTM, EXSEQ),
    mode = "last",
    by_vars = vars(STUDYID, USUBJID)
  )

This call returns the original data frame with the column TRTSDTM, TRTSTMF, TRTEDTM, and TRTETMF added. Exposure observations with incomplete date and zero doses of non placebo treatments are ignored. Missing time parts are imputed as first or last for start and end date respectively.

The datetime variables returned can be converted to dates using the derive_vars_dtm_to_dt() function.

adsl <- adsl %>%
  derive_vars_dtm_to_dt(source_vars = vars(TRTSDTM, TRTEDTM))

Now, that TRTSDT and TRTEDT are derived, the function derive_var_trtdurd() can be used to calculate the Treatment duration (TRTDURD).

adsl <- adsl %>%
  derive_var_trtdurd()
USUBJID RFSTDTC TRTSDTM TRTSDT TRTEDTM TRTEDT TRTDURD
01-701-1015 2014-01-02 2014-01-02 2014-01-02 2014-07-02 23:59:59 2014-07-02 182
01-701-1023 2012-08-05 2012-08-05 2012-08-05 2012-09-01 23:59:59 2012-09-01 28
01-701-1028 2013-07-19 2013-07-19 2013-07-19 2014-01-14 23:59:59 2014-01-14 180
01-701-1033 2014-03-18 2014-03-18 2014-03-18 2014-03-31 23:59:59 2014-03-31 14
01-701-1034 2014-07-01 2014-07-01 2014-07-01 2014-12-30 23:59:59 2014-12-30 183
01-701-1047 2013-02-12 2013-02-12 2013-02-12 2013-03-09 23:59:59 2013-03-09 26
01-701-1057 NA NA NA NA NA
01-701-1097 2014-01-01 2014-01-01 2014-01-01 2014-07-09 23:59:59 2014-07-09 190
01-701-1111 2012-09-07 2012-09-07 2012-09-07 2012-09-16 23:59:59 2012-09-16 10
01-701-1115 2012-11-30 2012-11-30 2012-11-30 2013-01-23 23:59:59 2013-01-23 55

Derive Disposition Variables

Disposition Dates (e.g. EOSDT)

The functions derive_vars_dt() and derive_vars_merged() can be used to derive a disposition date. First the character disposition date (DS.DSSTDTC) is converted to a numeric date (DSSTDT) calling derive_vars_dt(). Then the relevant disposition date is selected by adjusting the filter_add parameter.

To derive the End of Study date (EOSDT), a call could be:

# convert character date to numeric date without imputation
ds_ext <- derive_vars_dt(
  ds,
  dtc = DSSTDTC,
  new_vars_prefix = "DSST"
)

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ds_ext,
    by_vars = vars(STUDYID, USUBJID),
    new_vars = vars(EOSDT = DSSTDT),
    filter_add = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
  )

With DS:

USUBJID DSCAT DSDECOD DSTERM DSSTDTC
01-701-1015 PROTOCOL MILESTONE RANDOMIZED RANDOMIZED 2014-01-02
01-701-1015 DISPOSITION EVENT COMPLETED PROTOCOL COMPLETED 2014-07-02
01-701-1015 OTHER EVENT FINAL LAB VISIT FINAL LAB VISIT 2014-07-02
01-701-1023 PROTOCOL MILESTONE RANDOMIZED RANDOMIZED 2012-08-05
01-701-1023 DISPOSITION EVENT ADVERSE EVENT ADVERSE EVENT 2012-09-02
01-701-1023 OTHER EVENT FINAL LAB VISIT FINAL LAB VISIT 2012-09-02
01-701-1023 OTHER EVENT FINAL RETRIEVAL VISIT FINAL RETRIEVAL VISIT 2013-02-18
01-701-1028 PROTOCOL MILESTONE RANDOMIZED RANDOMIZED 2013-07-19
01-701-1028 DISPOSITION EVENT COMPLETED PROTOCOL COMPLETED 2014-01-14
01-701-1028 OTHER EVENT FINAL LAB VISIT FINAL LAB VISIT 2014-01-14

We would get :

USUBJID EOSDT
01-701-1015 2014-07-02
01-701-1023 2012-09-02
01-701-1028 2014-01-14
01-701-1033 2014-04-14
01-701-1034 2014-12-30
01-701-1047 2013-03-29
01-701-1057 NA
01-701-1097 2014-07-09
01-701-1111 2012-09-17
01-701-1115 2013-01-23

This call would return the input dataset with the column EOSDT added. This function allows the user to impute partial dates as well. If imputation is needed and the date is to be imputed to the first of the month, then set date_imputation = "FIRST".

Disposition Status (e.g. EOSSTT)

The function derive_var_disposition_status() can be used to derive a disposition status at a specific timepoint. The relevant disposition variable (DS.DSDECOD) is selected by adjusting the filter parameter and used to derive EOSSTT.

To derive the End of Study status (EOSSTT), a call could be:

adsl <- adsl %>%
  derive_var_disposition_status(
    dataset_ds = ds,
    new_var = EOSSTT,
    status_var = DSDECOD,
    filter_ds = DSCAT == "DISPOSITION EVENT"
  )
USUBJID EOSDT EOSSTT
01-701-1015 2014-07-02 COMPLETED
01-701-1023 2012-09-02 DISCONTINUED
01-701-1028 2014-01-14 COMPLETED
01-701-1033 2014-04-14 DISCONTINUED
01-701-1034 2014-12-30 COMPLETED
01-701-1047 2013-03-29 DISCONTINUED
01-701-1057 NA NOT STARTED
01-701-1097 2014-07-09 COMPLETED
01-701-1111 2012-09-17 DISCONTINUED
01-701-1115 2013-01-23 DISCONTINUED

Link to DS.

This call would return the input dataset with the column EOSSTT added.

By default, the function will derive EOSSTT as

  • "NOT STARTED" if DSDECOD is "SCREEN FAILURE" or "SCREENING NOT COMPLETED"
  • "COMPLETED" if DSDECOD == "COMPLETED"
  • "DISCONTINUED" if DSDECOD is not "COMPLETED" or NA
  • "ONGOING" otherwise

If the default derivation must be changed, the user can create his/her own function and pass it to the format_new_var argument of the function (format_new_var = new_mapping) to map DSDECOD to a suitable EOSSTT value.

Example function format_eosstt():

format_eosstt <- function(DSDECOD) {
  case_when(
    DSDECOD %in% c("COMPLETED") ~ "COMPLETED",
    DSDECOD %in% c("SCREEN FAILURE") ~ NA_character_,
    !is.na(DSDECOD) ~ "DISCONTINUED",
    TRUE ~ "ONGOING"
  )
}

The customized mapping function format_eosstt() can now be passed to the main function:


adsl <- adsl %>%
  derive_var_disposition_status(
    dataset_ds = ds,
    new_var = EOSSTT,
    status_var = DSDECOD,
    format_new_var = format_eosstt,
    filter_ds = DSCAT == "DISPOSITION EVENT"
  )

This call would return the input dataset with the column EOSSTT added.

Disposition Reason(s) (e.g. DCSREAS, DCSREASP)

The main reason for discontinuation is usually stored in DSDECOD while DSTERM provides additional details regarding subject’s discontinuation (e.g., description of "OTHER").

The function derive_vars_disposition_reason() can be used to derive a disposition reason (along with the details, if required) at a specific timepoint. The relevant disposition variable(s) (DS.DSDECOD, DS.DSTERM) are selected by adjusting the filter parameter and used to derive the main reason (and details).

To derive the End of Study reason(s) (DCSREAS and DCSREASP), the call would be:

adsl <- adsl %>%
  derive_vars_disposition_reason(
    dataset_ds = ds,
    new_var = DCSREAS,
    reason_var = DSDECOD,
    new_var_spe = DCSREASP,
    reason_var_spe = DSTERM,
    filter_ds = DSCAT == "DISPOSITION EVENT" & DSDECOD != "SCREEN FAILURE"
  )
USUBJID EOSDT EOSSTT DCSREAS DCSREASP
01-701-1015 2014-07-02 COMPLETED NA NA
01-701-1023 2012-09-02 DISCONTINUED ADVERSE EVENT NA
01-701-1028 2014-01-14 COMPLETED NA NA
01-701-1033 2014-04-14 DISCONTINUED STUDY TERMINATED BY SPONSOR NA
01-701-1034 2014-12-30 COMPLETED NA NA
01-701-1047 2013-03-29 DISCONTINUED ADVERSE EVENT NA
01-701-1057 NA NOT STARTED NA NA
01-701-1097 2014-07-09 COMPLETED NA NA
01-701-1111 2012-09-17 DISCONTINUED ADVERSE EVENT NA
01-701-1115 2013-01-23 DISCONTINUED ADVERSE EVENT NA

Link to DS.

This call would return the input dataset with the column DCSREAS and DCSREASP added.

By default, the function will map

  • DCSREAS as DSDECOD if DSDECOD is not "COMPLETED" or NA, NA otherwise
  • DCSREASP as DSTERM if DSDECOD is equal to OTHER, NA otherwise

If the default derivation must be changed, the user can create his/her own function and pass it to the format_new_var argument of the function (format_new_var = new_mapping) to map DSDECOD and DSTERM to a suitable DCSREAS/DCSREASP value.

Example function format_dcsreas():

format_dcsreas <- function(dsdecod, dsterm = NULL) {
  if (is.null(dsterm)) {
    if_else(dsdecod %notin% c("COMPLETED", "SCREEN FAILURE") & !is.na(dsdecod), dsdecod, NA_character_)
  } else {
    if_else(dsdecod == "OTHER", dsterm, NA_character_)
  }
}

The customized mapping function format_dcsreas() can now be passed to the main function:

adsl <- adsl %>%
  derive_vars_disposition_reason(
    dataset_ds = ds,
    new_var = DCSREAS,
    reason_var = DSDECOD,
    new_var_spe = DCSREASP,
    reason_var_spe = DSTERM,
    format_new_vars = format_dcsreas,
    filter_ds = DSCAT == "DISPOSITION EVENT"
  )

Randomization Date (RANDDT)

The function derive_vars_merged() can be used to derive randomization date variable. To map Randomization Date (RANDDT), the call would be:

adsl <- adsl %>%
  derive_vars_merged(
    dataset_add = ds_ext,
    filter_add = DSDECOD == "RANDOMIZED",
    by_vars = vars(STUDYID, USUBJID),
    new_vars = vars(RANDDT = DSSTDT)
  )

This call would return the input dataset with the column RANDDT is added.

USUBJID RANDDT
01-701-1015 2014-01-02
01-701-1023 2012-08-05
01-701-1028 2013-07-19
01-701-1033 2014-03-18
01-701-1034 2014-07-01
01-701-1047 2013-02-12
01-701-1057 NA
01-701-1097 2014-01-01
01-701-1111 2012-09-07
01-701-1115 2012-11-30

Link to DS.

Derive Death Variables

Death Date (DTHDT)

The function derive_vars_dt() can be used to derive DTHDT. This function allows the user to impute the date as well.

Example calls:

adsl <- adsl %>%
  derive_vars_dt(
    new_vars_prefix = "DTH",
    dtc = DTHDTC
  )
USUBJID TRTEDT DTHDTC DTHDT DTHFL
01-701-1015 2014-07-02 NA
01-701-1023 2012-09-01 NA
01-701-1028 2014-01-14 NA
01-701-1033 2014-03-31 NA
01-701-1034 2014-12-30 NA
01-701-1047 2013-03-09 NA
01-701-1057 NA NA
01-701-1097 2014-07-09 NA
01-701-1111 2012-09-16 NA
01-701-1115 2013-01-23 NA

This call would return the input dataset with the columns DTHDT added and, by default, the associated date imputation flag (DTHDTF) populated with the controlled terminology outlined in the ADaM IG for date imputations. If the imputation flag is not required, the user must set the argument flag_imputation to “none”.

If imputation is needed and the date is to be imputed to the first day of the month/year the call would be:

adsl <- adsl %>%
  derive_vars_dt(
    new_vars_prefix = "DTH",
    dtc = DTHDTC,
    date_imputation = "first"
  )

See also Date and Time Imputation.

Cause of Death (DTHCAUS)

The cause of death DTHCAUS can be derived using the function derive_var_dthcaus().

Since the cause of death could be collected/mapped in different domains (e.g. DS, AE, DD), it is important the user specifies the right source(s) to derive the cause of death from.

For example, if the date of death is collected in the AE form when the AE is Fatal, the cause of death would be set to the preferred term (AEDECOD) of that Fatal AE, while if the date of death is collected in the DS form, the cause of death would be set to the disposition term (DSTERM). To achieve this, the dthcaus_source() objects must be specified and defined such as it fits the study requirement.

dthcaus_source() specifications:

  • dataset_name: the name of the dataset where to search for death information,
  • filter: the condition to define death,
  • date: the date of death,
  • mode: first or last to select the first/last date of death if multiple dates are collected,
  • dthcaus: variable or text used to populate DTHCAUS.
  • traceability_vars: whether the traceability variables need to be added (e.g source domain, sequence, variable)

An example call to define the sources would be:

src_ae <- dthcaus_source(
  dataset_name = "ae",
  filter = AEOUT == "FATAL",
  date = AESTDTM,
  mode = "first",
  dthcaus = AEDECOD
)
USUBJID AESTDTC AEENDTC AEDECOD AEOUT
01-701-1211 2013-01-14 2013-01-14 SUDDEN DEATH FATAL
01-704-1445 2014-10-31 2014-10-31 COMPLETED SUICIDE FATAL
01-710-1083 2013-08-02 2013-08-02 MYOCARDIAL INFARCTION FATAL
src_ds <- dthcaus_source(
  dataset_name = "ds",
  filter = DSDECOD == "DEATH" & grepl("DEATH DUE TO", DSTERM),
  date = DSSTDT,
  mode = "first",
  dthcaus = "Death in DS"
)
USUBJID DSDECOD DSTERM DSSTDTC
01-701-1211 DEATH DEATH 2013-01-14
01-704-1445 DEATH DEATH 2014-11-01
01-710-1083 DEATH DEATH 2013-08-02

Once the sources are defined, the function derive_var_dthcaus() can be used to derive DTHCAUS:

ae_ext <- derive_vars_dtm(
  ae,
  dtc = AESTDTC,
  new_vars_prefix = "AEST",
  highest_imputation = "M",
  flag_imputation = "none"
)

adsl <- adsl %>%
  derive_var_dthcaus(src_ae, src_ds, source_datasets = list(ae = ae_ext, ds = ds_ext))
USUBJID EOSDT DTHDTC DTHDT DTHCAUS
01-701-1211 2013-01-14 2013-01-14 2013-01-14 SUDDEN DEATH
01-704-1445 2014-11-01 2014-11-01 2014-11-01 COMPLETED SUICIDE
01-710-1083 2013-08-02 2013-08-02 2013-08-02 MYOCARDIAL INFARCTION

The function also offers the option to add some traceability variables (e.g. DTHDOM would store the domain where the date of death is collected, and DTHSEQ would store the xxSEQ value of that domain). To add them, the traceability_vars argument must be added to the dthcaus_source() arguments:

src_ae <- dthcaus_source(
  dataset_name = "ae",
  filter = AEOUT == "FATAL",
  date = AESTDTM,
  mode = "first",
  dthcaus = AEDECOD,
  traceability_vars = vars(DTHDOM = "AE", DTHSEQ = AESEQ)
)

src_ds <- dthcaus_source(
  dataset_name = "ds",
  filter = DSDECOD == "DEATH" & grepl("DEATH DUE TO", DSTERM),
  date = DSSTDT,
  mode = "first",
  dthcaus = DSTERM,
  traceability_vars = vars(DTHDOM = "DS", DTHSEQ = DSSEQ)
)
adsl <- adsl %>%
  select(-DTHCAUS) %>% # remove it before deriving it again
  derive_var_dthcaus(src_ae, src_ds, source_datasets = list(ae = ae_ext, ds = ds_ext))
USUBJID TRTEDT DTHDTC DTHDT DTHCAUS DTHDOM DTHSEQ
01-701-1211 2013-01-12 2013-01-14 2013-01-14 SUDDEN DEATH AE 9
01-704-1445 2014-11-01 2014-11-01 2014-11-01 COMPLETED SUICIDE AE 1
01-710-1083 2013-08-01 2013-08-02 2013-08-02 MYOCARDIAL INFARCTION AE 1

Duration Relative to Death

The function derive_vars_duration() can be used to derive duration relative to death like the Relative Day of Death (DTHADY) or the numbers of days from last dose to death (LDDTHELD).

Example calls:

  • Relative Day of Death
adsl <- adsl %>%
  derive_vars_duration(
    new_var = DTHADY,
    start_date = TRTSDT,
    end_date = DTHDT
  )
  • Elapsed Days from Last Dose to Death
adsl <- adsl %>%
  derive_vars_duration(
    new_var = LDDTHELD,
    start_date = TRTEDT,
    end_date = DTHDT,
    add_one = FALSE
  )
USUBJID TRTEDT DTHDTC DTHDT DTHCAUS DTHADY LDDTHELD
01-701-1211 2013-01-12 2013-01-14 2013-01-14 SUDDEN DEATH 61 2
01-704-1445 2014-11-01 2014-11-01 2014-11-01 COMPLETED SUICIDE 175 0
01-710-1083 2013-08-01 2013-08-02 2013-08-02 MYOCARDIAL INFARCTION 12 1

Derive the Last Date Known Alive (LSTALVDT)

Similarly as for the cause of death (DTHCAUS), the last known alive date (LSTALVDT) can be derived from multiples sources and the user must ensure the sources (date_source()) are correctly defined.

date_source() specifications:

An example could be :

ae_start_date <- date_source(
  dataset_name = "ae",
  date = AESTDT
)
ae_end_date <- date_source(
  dataset_name = "ae",
  date = AEENDT
)
lb_date <- date_source(
  dataset_name = "lb",
  date = LBDT,
  filter = !is.na(LBDT)
)
trt_end_date <- date_source(
  dataset_name = "adsl",
  date = TRTEDT
)

Once the sources are defined, the function derive_var_extreme_dt() can be used to derive LSTALVDT:

# impute AE start and end date to first
ae_ext <- ae %>%
  derive_vars_dt(
    dtc = AESTDTC,
    new_vars_prefix = "AEST",
    highest_imputation = "M"
  ) %>%
  derive_vars_dt(
    dtc = AEENDTC,
    new_vars_prefix = "AEEN",
    highest_imputation = "M"
  )

# impute LB date to first
lb_ext <- derive_vars_dt(
  lb,
  dtc = LBDTC,
  new_vars_prefix = "LB",
  highest_imputation = "M"
)

adsl <- adsl %>%
  derive_var_extreme_dt(
    new_var = LSTALVDT,
    ae_start_date, ae_end_date, lb_date, trt_end_date,
    source_datasets = list(ae = ae_ext, adsl = adsl, lb = lb_ext),
    mode = "last"
  )
USUBJID TRTEDT DTHDTC LSTALVDT
01-701-1015 2014-07-02 2014-07-02
01-701-1023 2012-09-01 2012-09-02
01-701-1028 2014-01-14 2014-01-14
01-701-1033 2014-03-31 2014-04-14
01-701-1034 2014-12-30 2014-12-30
01-701-1047 2013-03-09 2013-04-07
01-701-1097 2014-07-09 2014-07-09
01-701-1111 2012-09-16 2012-09-17
01-701-1115 2013-01-23 2013-01-23
01-701-1118 2014-09-09 2014-09-09

Similarly to dthcaus_source(), the traceability variables can be added by specifying the traceability_vars argument in date_source().

ae_start_date <- date_source(
  dataset_name = "ae",
  date = AESTDT,
  traceability_vars = vars(LALVDOM = "AE", LALVSEQ = AESEQ, LALVVAR = "AESTDTC")
)
ae_end_date <- date_source(
  dataset_name = "ae",
  date = AEENDT,
  traceability_vars = vars(LALVDOM = "AE", LALVSEQ = AESEQ, LALVVAR = "AEENDTC")
)
lb_date <- date_source(
  dataset_name = "lb",
  date = LBDT,
  filter = !is.na(LBDT),
  traceability_vars = vars(LALVDOM = "LB", LALVSEQ = LBSEQ, LALVVAR = "LBDTC")
)
trt_end_date <- date_source(
  dataset_name = "adsl",
  date = TRTEDTM,
  traceability_vars = vars(LALVDOM = "ADSL", LALVSEQ = NA_integer_, LALVVAR = "TRTEDTM")
)

adsl <- adsl %>%
  select(-LSTALVDT) %>% # created in the previous call
  derive_var_extreme_dt(
    new_var = LSTALVDT,
    ae_start_date, ae_end_date, lb_date, trt_end_date,
    source_datasets = list(ae = ae_ext, adsl = adsl, lb = lb_ext),
    mode = "last"
  )
USUBJID TRTEDT DTHDTC LSTALVDT LALVDOM LALVSEQ LALVVAR
01-701-1015 2014-07-02 2014-07-02 ADSL NA TRTEDTM
01-701-1023 2012-09-01 2012-09-02 LB 107 LBDTC
01-701-1028 2014-01-14 2014-01-14 ADSL NA TRTEDTM
01-701-1033 2014-03-31 2014-04-14 LB 107 LBDTC
01-701-1034 2014-12-30 2014-12-30 ADSL NA TRTEDTM
01-701-1047 2013-03-09 2013-04-07 LB 134 LBDTC
01-701-1097 2014-07-09 2014-07-09 ADSL NA TRTEDTM
01-701-1111 2012-09-16 2012-09-17 LB 73 LBDTC
01-701-1115 2013-01-23 2013-01-23 ADSL NA TRTEDTM
01-701-1118 2014-09-09 2014-09-09 ADSL NA TRTEDTM

Derive Groupings and Populations

Grouping (e.g. AGEGR1 or REGION1)

Numeric and categorical variables (AGE, RACE, COUNTRY, etc.) may need to be grouped to perform the required analysis. {admiral} does not currently have functionality to assist with all required groupings. Some functions exist for age grouping according to FDA or EMA conventions. For others, the user can create his/her own function to meet his/her study requirement.

To derive AGEGR1 as categorized AGE in < 18, 18-65, >= 65 (FDA convention):

adsl <- adsl %>%
  derive_var_agegr_fda(
    age_var = AGE,
    new_var = AGEGR1
  )
#> Warning: `derive_var_agegr_ema()` was deprecated in admiral 0.8.0.

However for example if

  • AGEGR2 would categorize AGE in < 65, >= 65,
  • REGION1 would categorize COUNTRY in North America, Rest of the World,

the user defined function(s) would be like:

format_agegr2 <- function(var_input) {
  case_when(
    var_input < 65 ~ "< 65",
    var_input >= 65 ~ ">= 65",
    TRUE ~ NA_character_
  )
}

format_region1 <- function(var_input) {
  case_when(
    var_input %in% c("CAN", "USA") ~ "North America",
    !is.na(var_input) ~ "Rest of the World",
    TRUE ~ "Missing"
  )
}

These functions are then used in a mutate() statement to derive the required grouping variables:

adsl <- adsl %>%
  mutate(
    AGEGR2 = format_agegr2(AGE),
    REGION1 = format_region1(COUNTRY)
  )
USUBJID AGE SEX COUNTRY AGEGR1 AGEGR2 REGION1
01-701-1015 63 F USA 18-64 < 65 North America
01-701-1023 64 M USA 18-64 < 65 North America
01-701-1028 71 M USA >=65 >= 65 North America
01-701-1033 74 M USA >=65 >= 65 North America
01-701-1034 77 F USA >=65 >= 65 North America
01-701-1047 85 F USA >=65 >= 65 North America
01-701-1057 59 F USA 18-64 < 65 North America
01-701-1097 68 M USA >=65 >= 65 North America
01-701-1111 81 F USA >=65 >= 65 North America
01-701-1115 84 M USA >=65 >= 65 North America

Population Flags (e.g. SAFFL)

Since the populations flags are mainly company/study specific no dedicated functions are provided, but in most cases they can easily be derived using derive_var_merged_exist_flag.

An example of an implementation could be:

adsl <- adsl %>%
  derive_var_merged_exist_flag(
    dataset_add = ex,
    by_vars = vars(STUDYID, USUBJID),
    new_var = SAFFL,
    condition = (EXDOSE > 0 | (EXDOSE == 0 & str_detect(EXTRT, "PLACEBO")))
  )
USUBJID TRTSDT ARM ACTARM SAFFL
01-701-1015 2014-01-02 Placebo Placebo Y
01-701-1023 2012-08-05 Placebo Placebo Y
01-701-1028 2013-07-19 Xanomeline High Dose Xanomeline High Dose Y
01-701-1033 2014-03-18 Xanomeline Low Dose Xanomeline Low Dose Y
01-701-1034 2014-07-01 Xanomeline High Dose Xanomeline High Dose Y
01-701-1047 2013-02-12 Placebo Placebo Y
01-701-1057 NA Screen Failure Screen Failure NA
01-701-1097 2014-01-01 Xanomeline Low Dose Xanomeline Low Dose Y
01-701-1111 2012-09-07 Xanomeline Low Dose Xanomeline Low Dose Y
01-701-1115 2012-11-30 Xanomeline Low Dose Xanomeline Low Dose Y

Derive Other Variables

The users can add specific code to cover their need for the analysis.

The following functions are helpful for many ADSL derivations:

Add Labels and Attributes

Adding labels and attributes for SAS transport files is supported by the following packages:

NOTE: All these packages are in the experimental phase, but the vision is to have them associated with an End to End pipeline under the umbrella of the pharmaverse.

Example Script

ADaM Sample Code
ADSL ad_adsl.R