Working With Custom Attributes and Summary Statistics

Introduction

This vignette discusses some recent updates in EpiModel on working with attributes and summary statistics within the stochastic network model class. This material is oriented towards custom models with extension modules and functions. Further details are provided in the “New Network Models with EpiModel” section of the EpiModel tutorials.

In network simulations with netsim, we store all data in the Master List Object (referred to as dat). In this vignette we will discuss with three types of data on dat:

Current Nodal Attributes

Attributes are characteristics of the nodes (e.g., persons) in the model at the current time-step. By default, every node has a unique_id and an active attribute used to track each node, as well as three attributes used in the epidemic modules: status for disease status, entrTime for the time the node entered the population, and exitTime for the time the node left the population. Other attributes can be added of any name and value, like age, but must be stored as a scalar values.

We work with attribute vectors that are all of the same size of the number of nodes in the model. In the attribute vectors, a node is identified by its Positional ID or posit_id (i.e., the position in vector). The default attribute unique_id created for each node gives a unique identification number allowing us to refer to nodes no longer in the model. Deaths or other forms of exit from the network disrupt the positional ID but do not impact the unique ID.

Working With Attributes

Accessing Attributes

The EpiModel::get_attr function will extract the vector of a given attribute. In its simplest form, we can pull a complete attribute vector from dat like so:

active <- EpiModel::get_attr(dat, "active")

The above call will pull from dat the active attribute for all nodes. active is a vector of size current number of nodes. With values being either 1 or 0 depending on whether a node is active or not.

Trying to extract an attribute that does not exist will cause EpiModel::get_attr to throw an error.

Modifying Attributes

In custom extension modules, we usually want to modify some of the attributes. Below is a minimal module that increments the age of all the nodes by 1.

aging_module <- function(dat, at) {
  
  # Extract current attribute
  age <- EpiModel::get_attr(dat, "age")

  # Aging process
  new_age <- age + 1

  # Output updated attributes
  dat <- EpiModel::set_attr(dat, "age", new_age)

  return(dat)
}

Let’s break down this very simple yet perfectly valid module.

  1. Pull the age attribute vector as we did in the previous section.
  2. Create a vector new_age incrementing all ages by one.
  3. Update the dat object with the EpiModel::set_attr function.
  4. Return the Updated dat object.

We can see that EpiModel::set_attr takes as arguments:

  • The dat object to update.
  • The name of the attribute vector of interest (here “age”).
  • The new values for this vector (here new_age).

When using EpiModel::set_attr, there are several things to note:

  • The function does not modify the dat object, it merely returns a modified version of it to be assigned back to dat. (Nicely, this does not cause performance issues due to the way R handles shallow copies since version 3.1)
  • If new_age size is not equal to the number of nodes in the network, the function will throw an error.

Summary

The above example describes the recommended way to work with attribute:

  1. Extract the attributes into local vectors.
  2. Modify the local vectors with some dynamic process (like aging).
  3. Update the dat object with the revised local vectors.
  4. Return the dat object at the end of the function.

These functions have other arguments that are described in the documentation: see help("net-accessor", package = "EpiModel") for further details.

Historical Nodal Attributes

The Attributes described above refer to the state of each node in the network at the current time-step. However, it is sometimes useful to track of what happened to nodes over time. Because saving the full history of the attributes would consume too much memory and is rarely necessary for full-scale research models, EpiModel offers a way to record specific attribute for specific nodes at different time-steps. These may be useful for diagnostic purposes (e.g., to see that a dynamic process is coded correctly) or for tracking a limited set of individual-level outcomes for further analysis.

The Attribute History is an efficient collection of recorded attributes at different time-steps. EpiModel has functions to record these elements and to access them in a convenient manner once the netsim simulation is complete.

Working with Attribute Histories

Recording Attribute Histories

The following is a module that records the viral load of infected nodes every 10 time steps. We assume that this module is part of a model that defines and updates two attributes over time:

  • status with possible values being “infected” or “susceptible”.
  • viral_load as a continuous number for “infected” nodes and NA otherwise.
viral_load_logger_module <- function(dat, at) {
  
  # Run every 10 time steps
  if (at %% 10 == 0) {
    
    # Attributes
    status <- EpiModel::get_attr(dat, "status")
    viral_load <- EpiModel::get_attr(dat, "viral_load")

    infected <- which(status == "infected")

    dat <- EpiModel::record_attr_history(
      dat, at,
      "viral_load",
      infected,
      viral_load[infected]
    )
  }

  # Output
  return(dat)
}

The steps in the code are as follows:

  1. Check that the current time step is a multiple of ten. If yes, go to step 2, otherwise skip to step 5.
  2. Pull the 2 attribute vectors of interest (“status” and “viral_load”).
  3. Store in infected the posit_ids of the infected nodes.
  4. Record the viral_load of infected nodes at time at under the label “viral_load”.
  5. Return the dat object.

EpiModel::record_attr_history takes five arguments:

  1. The dat object.
  2. The time step to be associated with the recording (here at, the current time-step).
  3. The label to be used for the attribute (here “viral_load”).
  4. A vector of posit_ids referring to the nodes of interest (here infected).
  5. The values to be recorded for those IDs (here viral_load[infected])

Note that EpiModel::record_attr_history requires a set of posit_ids. Internally, the function will convert them to unique_ids so the Attribute History will not be affected by nodes entering or leaving the population over time.

When recording some attribute histories, one must ensure that we record as many values as there are posit_ids. Otherwise, the function will throw an error. It however possible to use only one value for that attribute even if we record a value for multiple nodes. This last situation actually uses less RAM. In any case, we would want to record attribute histories sparingly as the storage can be large, especially for open population models with many nodes that run over many time steps.

Accessing the Attribute Histories

The Attribute History is meant to be accessed once the netsim simulation is complete. At that point, we can use the EpiModel::get_attr_history function to access the histories that we have recorded, like so:

sim <- netsim(est, param, init, control)
attr_history <- EpiModel::get_attr_history(sim)

The attr_history object is a list of data.frames. One for each attribute history that was recorded (we use this flexible list structure because each history may be of different dimension). Assume that we were running two simulations, using the module defined above and another one (not shown) recording when a node switched from infected to recovered.

get_attr_history(sim)

# $viral_load
#    sim step attribute    uids values
# 1  1   10   viral_load   1001   2000
# 2  1   10   viral_load   1002   1878
# 3  1   20   viral_load   1001   1500
# 4  1   20   viral_load   1002    300
# 5  2   10   viral_load    401   2500
# 6  2   10   viral_load    402   1378
# 7  2   20   viral_load    401   1200
# 8  2   20   viral_load    402    100
# ...
#
# $switch_status
#    sim step attribute      uids     values
# 1  1   22   switch_status  1001   infected
# 2  1   64   switch_status  1002   infected
# 3  1  110   switch_status  1001  recovered
# 4  1  220   switch_status  1002  recovered
# 5  2    7   switch_status   401   infected
# 6  2   15   switch_status   402   infected
# 7  2   20   switch_status   401  recovered
# 8  2  120   switch_status   402  recovered
# ...

We would get a named list of two data.frame’s:

  • “viral_load” with the value column being the viral loads.
  • “switch_status” with the value column being whether the node became “infected” or “recovered” at the given time-step.

Epidemic Trackers

The next type of data stored in dat is called an Epidemic Tracker. This refers to some summary information about the population at a specific time step. This information is created and stored for each time step; therefore epidemic trackers are historical summary statistics.

One example of an Epidemic Trackers is num, present in any model, which stores the size of the population at each time step.

Working With Epidemic Trackers in Modules

Inside a module, Epidemic Trackers are accessed and modified with the functions EpiModel::get_epi and EpiModel::set_epi. Below is an updated new version of our aging_module above with the addition of epidemic trackers.

aging_track_module <- function(dat, at) {
  
  # Attributes
  age <- EpiModel::get_attr(dat, "age")

  # Aging process 
  new_age <- age + 1
  
  # Calculate summary statistics
  mean_age <- mean(new_age)
  prev_mean_age <- EpiModel::get_epi(dat, at - 1, "mean_age")
  age_change <- mean_age - prev_mean_age

  # Update nodal attributes
  dat <- EpiModel::set_attr(dat, "age", new_age)

  # Update epidemic trackers
  dat <- set_epi(dat, "mean_age", at, mean_age)
  dat <- set_epi(dat, "age_change", at, age_change)

  return(dat)
}

In this new module, in addition to incrementing the age of each node by one, we also record two values as Epidemic Trackers: the mean age of the population and the change in mean age compared to the previous step (we could have calculated the latter after the simulation was complete with mutate_epi, but here we do it on the fly to demonstrate get_epi).

We extract the mean age at the previous time step using EpiModel::get_epi and set the second argument as at - 1. After all the calculations are complete, we store mean_age and age_change in dat using EpiModel::set_epi.

Accessing Epidemic Trackers After a Simulation

Epidemic Trackers are usually the main quantities of interest after a simulation has completed. We access them simply by calling as.data.frame on a netsim object or by using the plot or summary functions within by EpiModel. Note also that you can perform derived summary statistic calculations after a netsim call is complete with EpiModel::mutate_epi.

Custom Epidemic Trackers

It can be useful to create small Epidemic Trackers outside of the modules and use them only when we need them. EpiModel contains an optional module called EpiModel::trackers.net allowing such functionality.

Writing Tracker Functions

We call a tracker function a function that takes dat and at as arguments and outputs a scalar value. Every tracker function is run by the trackers.net module at each time step.

epi_s_num <- function(dat, at) {
  needed_attributes <- c("status")
  output <- with(get_attr_list(dat, needed_attributes), {
    out <- sum(status == "susceptible", na.rm = TRUE)
    out
  })
  return(output)
}

The epi_s_num function defined above is a tracker function. It calculates at each time step the number of susceptible nodes in the network.

The next example is a tracker function that calculates the proportion of the population infected at each time step. Let’s look what each element does:

epi_prop_infected <- function(dat, at) {
  # we need two attributes for our calculation: `status` and `active`
  needed_attributes <- c("status", "active")

  # we use `with` to simplify code
  output <- with(EpiModel::get_attr_list(dat, needed_attributes), {
    pop <- active == 1             # we only look at active nodes
    cond <- status == "infected"   # which are infected 

    # how many are `infected` among the `active`
    out <- sum(cond & pop, na.rm = TRUE) / sum(pop, na.rm = TRUE)
    out
  })

  return(output)
}

We recommend that you write your tracker functions using this format:

  1. Define a needed_attributes variable containing a vector of attribute names: in this example: “status” and “active”.
  2. Use with and EpiModel::get_attr_list(dat, needed_attributes) to work in an environment with only the objects you need. This helps clarify what the tracker does and simplify any debugging. We save the result of this call into output.
  3. Define out as the calculated outcome inside with. This must be a scalar. This is the last element evaluated by the with expression.
  4. return(output), what was calculated inside the with construct.

Using A trackers.net Module

This functionality is part of an optional module provided by EpiModel, trackers.net. This module must be enabled in control.net. The tracker functions themselves must be declared as a named list in control.net: tracker.list. Let’s make a simple SI epidemic model with some added trackers:

# Create the `tracker.list` list
some.trackers <- list(
  prop_infected = epi_prop_infected,
  s_num         = epi_s_num
)

control <- EpiModel::control.net(
  type = NULL, # must be NULL as we use custom extension modules
  nsims = 1,
  nsteps = 50,
  verbose = FALSE,
  infection.FUN = EpiModel::infection.net,
  trackers.FUN = EpiModel::trackers.net,
  tracker.list = some.trackers
)

param <- EpiModel::param.net(
  inf.prob = 0.3,
  act.rate = 0.1
)

nw <- network_initialize(n = 50)
nw <- set_vertex_attribute(nw, "race", rbinom(50, 1, 0.5))
est <- EpiModel::netest(
  nw,
  formation = ~edges,
  target.stats = 25,
  coef.diss = dissolution_coefs(~offset(edges), 10, 0),
  verbose = FALSE
)
#> Starting maximum pseudolikelihood estimation (MPLE):
#> Evaluating the predictor and response matrix.
#> Maximizing the pseudolikelihood.
#> Finished MPLE.
#> Stopping at the initial estimate.

init <- EpiModel::init.net(i.num = 10)
sim <- EpiModel::netsim(est, param, init, control)

d <- as.data.frame(sim)

knitr::kable(tail(d, n = 15))
sim time s.num i.num num prop_infected s_num si.flow
36 1 36 30 20 50 0 0 0
37 1 37 30 20 50 0 0 0
38 1 38 30 20 50 0 0 0
39 1 39 30 20 50 0 0 0
40 1 40 30 20 50 0 0 0
41 1 41 30 20 50 0 0 0
42 1 42 30 20 50 0 0 0
43 1 43 30 20 50 0 0 0
44 1 44 30 20 50 0 0 0
45 1 45 30 20 50 0 0 0
46 1 46 30 20 50 0 0 0
47 1 47 29 21 50 0 0 1
48 1 48 29 21 50 0 0 0
49 1 49 29 21 50 0 0 0
50 1 50 29 21 50 0 0 0

Each function must be named in the tracker.list list. The name given there will be used to identify the tracker function in the epi list and will be the name of the corresponding column of the data.frame produced by as.data.frame(sim) where sim is a netsim object.

Note: in the some.trackers list, we put epi_prop_infected and epi_s_num without parentheses at the end. This is because we store the function and not the result of calling the function.

This is the minimal control.net to enable the module. Usually, in research-level models, we will want to add other extension modules. See the “Advanced Extension Models” section in the EpiModel Tutorials site for further examples.

Important: when enabling the trackers.net module, one must pay particular attention to the order the modules are run. It is recommended to have the trackers.net module run last so the tracked values will reflect the state of the model at the end of the time step.

Record Any Object (Advanced Debugging)

When working with complex research-level models, we sometimes want to inspect the state of an object without stopping the simulation. The function EpiModel::record_raw_object allows the user to save any object during the simulation.

introspect_module <- function(dat, at) {
  # Attributes
  age <- get_attr(dat, "age")
  
  if (mean(age, na.rm = TRUE) > 50) {
    obj <- data.frame(
        age = age,
        status = EpiModel::get_attr(dat, "status")
    )
    dat <- EpiModel::record_raw_object(dat, at, "old pop", obj)
  }

  return(dat)
}

In this module, we look at the age of the population and if the mean age is more than 50, we create a data.frame called obj containing the age and status attribute vectors and store it in a Raw Record.

This Raw Record can be accessed in the final netsim object for debugging purposes.

sim <- netsim(est, param, init, control)
sim[["raw.records"]][[1]]