In this vignette, we introduce how to explore recurrent event data by nonparametric sample mean function (also called Nelson-Aalen estimator) and modeling the event counts of interest by gamma frailty model with the reda package through examples. Most functions in the package are S4 methods that produce S4 class objects. The details of function syntax and the produced objects are available in the package manual, which will thus not be covered in this vignette.

Introduction

Simulated recurrent event data

library(reda)

First of all, the sample recurrent event data we are going to use in the following examples is called simuDat, which contains totally 500 observations of 6 variables.

head(simuDat)
##   ID time event group    x1 gender
## 1  1    1     1 Contr -1.93 female
## 2  1   22     1 Contr -1.93 female
## 3  1   23     1 Contr -1.93 female
## 4  1   57     1 Contr -1.93 female
## 5  1  112     0 Contr -1.93 female
## 6  2  140     0 Treat -0.11 female
str(simuDat)
## 'data.frame':    500 obs. of  6 variables:
##  $ ID    : num  1 1 1 1 1 2 3 3 4 4 ...
##  $ time  : num  1 22 23 57 112 140 40 168 14 112 ...
##  $ event : int  1 1 1 1 0 0 1 0 1 0 ...
##  $ group : Factor w/ 2 levels "Contr","Treat": 1 1 1 1 1 2 1 1 1 1 ...
##  $ x1    : num  -1.93 -1.93 -1.93 -1.93 -1.93 -0.11 0.2 0.2 -0.43 -0.43 ...
##  $ gender: Factor w/ 2 levels "female","male": 1 1 1 1 1 1 1 1 1 1 ...

where

  • ID: Subjects identification (ID).
  • time: Event or censoring time.
  • event: Event indicator, 1 = event; 0 = censored.
  • group: Treatment group indicator.
  • x1: Continuous variable.
  • gender: Gender of subjects.

The dataset was simulated by thinning method (Lewis and Shedler 1979) and further processed for a better demonstration purpose. (Note that reda also provides functions for simulating survival data, recurrent event data, or multiple state data. See another package vignette for details.)

Data checking

The subjects’ ID, event or censoring times, event indicators or costs, and time origins of the follow-up is specified through the corresponding arguments, ID, time, event, and origin in function Survr, where the argument event takes negative values as censoring, and positive values as costs or event indicators. The function Survr serves as the formula response and contains a considerate data checking procedure for recurrent event data modeled by methods based on counts and rate function. The observations of the covariates specified in the formula will be checked first, where the checking rules include

  • Subjects’ ID, event times or censoring times cannot be missing;
  • For every subject, the event time cannot not be later than censoring time;
  • Each subject must have one and only one censoring time;
  • Subjects may have different time origins. But one subject must has only one time origin.

The subject’s ID will be pinpointed (in most cases) if its observation violates any given checking rule.

Exploratory analysis

Computing sample MCF

The nonparametric sample mean cumulative function (MCF) is also called Nelson-Aalen Estimator (Nelson 2003). The point estimate of MCF at each time point does not assume any particular underlying model. The variance estimates at each time point is computed by the Lawless and Nadeau method (Lawless and Nadeau 1995), Poisson process method or based on the well-known bootstrap method (Efron 1979) with subjects as resampling units. The approximate confidence intervals are provided as well, which are constructed based on the asymptotic normality of the MCF estimates itself (the default) or logarithm of the MCF estimates.

The function mcf is a generic function for the MCF estimates from a sample data or a fitted gamma frailty model (as demonstrated later). If a formula with Survr as response is specified in function mcf, the formula method for estimating the sample MCF will be called. The covariate specified at the right hand side of the formula should be either 1 or any “linear” combination of factor variables in the data. The former computes the overall sample MCF. The latter computes the sample MCF for each level of the combination of the factor variable(s) specified, respectively.

The valve-seat dataset in Nelson (1995) and the simulated sample data are used for demonstration as follows:

## Example 1. valve-seat data
valveMcf0 <- mcf(Survr(ID, Days, No.) ~ 1, data = valveSeats)

## Example 2. the simulated data
simuMcf <- mcf(Survr(ID, time, event) ~ group + gender,
               data = simuDat, subset = ID %in% 1 : 50)

After estimation, we may plot the sample MCF by function plot, which actually returns a ggplot object so that the plot produced can be easily further customized by ggplot2. The legendname and legendLevels can be specified to easily customize the legend in the plot. Two examples are given as follows:

## overall sample MCF for valve-seat data in Nelson (1995)
plot(valveMcf0, conf.int = TRUE, mark.time = TRUE, addOrigin = TRUE, col = 2) +
    ggplot2::xlab("Days") + ggplot2::theme_bw()

## sample MCF for different groups (the default theme)
plot(simuMcf, conf.int = TRUE, lty = 1 : 4, legendName = "Treatment & Gender")