This is a data package with 15 medical datasets for teaching Reproducible Medical Research with R. The link to the pkgdown reference website for {medicaldata} is here and in the links at the right. This package will be useful for anyone teaching R to medical professionals, including doctors, nurses, trainees, and students.

These datasets range from reconstructed versions of James Lind’s scurvy dataset (1757) and the original Streptomycin for Tuberculosis trial (1948), a 2012 RCT of indomethacin to prevent post-ERCP pancreatitis that I was involved in, to cohort data on SARS-CoV2 testing results (2020). Many of the datasets come from the American Statistical Association’s TSHS (Teaching Statistics in the Health Sciences) Resources Portal, maintained by Carol Bigelow at the University of Massachusetts (with permission).

How to Install and Use {medicaldata} Datasets

  1. Install with: remotes::install_github("higgi13425/medicaldata")

  2. Then load the package with library(medicaldata)

  3. Then you can list the datasets available with data(package = "medicaldata")

  4. Then assign a particular dataset to a named object in your environment with:
    covid <- medicaldata::covid_testing
    where covid is the name of the new object, and covid_testing is the name of the dataset.

  5. Articles (vignettes) on how to use the datasets can be found at the pkgdown website under the Articles tab.

  6. You can click on the links below to view the codebook and/or description document for each dataset. This information is also available under the Reference tab above, or within R by using help(dataset_name).

Data Donations

If you have access to data from a randomized, controlled clinical trial, or a prospective cohort study, or even a case-control study, please consider obtaining the appropriate permissions, anonymizing the data, and donating the dataset for teaching purposes to add to this package. Open an issue to open the discussion of a data donation.

List of Datasets

Click on links below for more details about the dataset itself in the Description Document, and more details about the variables included in the dataset in the Codebook. Note that each dataset also has a help file that you can use within R or RStudio, by entering help("dataset_name") in the Console pane.

Dataset Description document Codebook
strep_tb strep_tb_desc strep_tb_codebook
scurvy scurvy_desc scurvy_codebook
indo_rct indo_rct_desc indo_rct_codebook
polyps polyps_desc polyps_codebook
covid_testing covid_desc covid_codebook
blood_storage blood_storage_desc blood_storage_codebook
cytomegalovirus cytomegalovirus_desc cytomegalovirus_codebook
esoph_ca esoph_ca_desc esoph_ca_codebook
laryngoscope laryngoscope_desc laryngoscope_codebook
licorice_gargle licorice_gargle_desc licorice_gargle_codebook
opt opt_desc opt_codebook
smartpill smartpill_desc smartpill_codebook
supraclavicular supraclavicular_desc supraclavicular_codebook
indometh indometh_desc indometh_codebook
theoph theoph_desc theoph_codebook