# policy_data

library(polle)

This vignette is a guide to policy_data(). As the name suggests, the function creates a policy_data object with a specific data structure making it easy to use in combination with policy_def(), policy_learn(), and policy_eval(). The vignette is also a guide to some of the associated S3 functions which transform or access parts of the data, see ?policy_data and methods(class="policy_data").

We will start by looking at a simple single-stage example, then consider a fixed two-stage example with varying actions sets and data in wide format, and finally we will look at an example with a stochastic number of stages and data in long format.

# Single-stage: wide data

Consider a simple single-stage problem with covariates/state variables $$(Z, L, B)$$, binary action variable $$A$$, and utility outcome $$U$$. We use sim_single_stage() to simulate data:

(d <- sim_single_stage(n = 5e2, seed=1)) |> head()
#>            Z          L B A          U
#> 1  1.2879704 -1.4795962 0 1 -0.9337648
#> 2  1.6184181  1.2966436 0 1  6.7506026
#> 3  1.2710352 -1.0431352 0 1 -0.3377580
#> 4 -0.2157605  0.1198224 1 0  1.4993427
#> 5 -1.0671588 -1.3663727 0 1 -9.1718727
#> 6 -1.4469746 -0.4018530 0 0 -2.6692961

We give instructions to policy_data() which variables define the action, the state covariates, and the utility variable:

pd <- policy_data(d, action="A", covariates=list("Z", "B", "L"), utility="U")
pd
#> Policy data with n = 500 observations and maximal K = 1 stages.
#>
#>      action
#> stage   0   1   n
#>     1 278 222 500
#>
#> Baseline covariates:
#> State covariates: Z, B, L
#> Average utility: -0.98

In the single-stage case the history $$H$$ is just $$(B, Z, L)$$. We access the history and actions using get_history():

get_history(pd)$H |> head() #> Key: <id, stage> #> id stage Z B L #> <int> <int> <num> <num> <num> #> 1: 1 1 1.2879704 0 -1.4795962 #> 2: 2 1 1.6184181 0 1.2966436 #> 3: 3 1 1.2710352 0 -1.0431352 #> 4: 4 1 -0.2157605 1 0.1198224 #> 5: 5 1 -1.0671588 0 -1.3663727 #> 6: 6 1 -1.4469746 0 -0.4018530 get_history(pd)$A |> head()
#> Key: <id, stage>
#>       id stage      A
#>    <int> <int> <char>
#> 1:     1     1      1
#> 2:     2     1      1
#> 3:     3     1      1
#> 4:     4     1      0
#> 5:     5     1      1
#> 6:     6     1      0

Similarly, we access the utility outcomes $$U$$:

#> Key: <id>
#>       id          U
#>    <int>      <num>
#> 1:     1 -0.9337648
#> 2:     2  6.7506026
#> 3:     3 -0.3377580
#> 4:     4  1.4993427
#> 5:     5 -9.1718727
#> 6:     6 -2.6692961

# Two-stage: wide data

Consider a two-stage problem with observations $$O = (B, BB, L_{1}, C_{1}, U_{1}, A_1, L_2, C_{2}, U_{2}, A_2, U_{3})$$. Following the general notation introduced in Section 3.1 of (Nordland and Holst 2023), $$(B,BB)$$ are the baseline covariates, $$S_k =(L_{k, C_{k}})$$ are the state covariates at stage k, $$A_{k}$$ is the action at stage k, and $$U_k$$ is the reward at stage $$k$$. The utility is the sum of the rewards $$U=U_{1}+U_{2}+U_{3}$$.

We use sim_two_stage_multi_actions() to simulate data:

d <- sim_two_stage_multi_actions(n=2e3, seed = 1)
colnames(d)
#>  [1] "B"   "BB"  "L_1" "C_1" "A_1" "L_2" "C_2" "A_2" "L_3" "U_1" "U_2" "U_3"

Note that the data is in wide format. The data is transformed using policy_data() with instructions on which variables define the actions, baseline covariates, state covariates, and the rewards:

pd <- policy_data(d,
action = c("A_1", "A_2"),
baseline = c("B", "BB"),
covariates = list(L = c("L_1", "L_2"),
C = c("C_1", "C_2")),
utility = c("U_1", "U_2", "U_3"))
pd
#> Policy data with n = 2000 observations and maximal K = 2 stages.
#>
#>      action
#> stage default   no  yes    n
#>     1       0 1017  983 2000
#>     2     769  826  405 2000
#>
#> Baseline covariates: B, BB
#> State covariates: L, C
#> Average utility: 0.39

The length of the character vector action determines the number of stages K (in this case 2). If the number of stages is 2 or more, the covariates argument must be a named list. Each element must be a character vector with length equal to the number of stages. If a covariate is not available at a given stage we insert an NA value, e.g., L = c(NA, "L_2").

Finally, the utility argument must be a single character string (the utility is observed after stage K) or a character vector of length K+1 with the names of the rewards.

In this example, the observed action sets vary for each stage. get_action_set() returns the global action set and get_stage_action_sets() returns the action set for each stage:

get_action_set(pd)
#> [1] "default" "no"      "yes"
get_stage_action_sets(pd)
#> $stage_1 #> [1] "no" "yes" #> #>$stage_2
#> [1] "default" "no"      "yes"

The full histories $$H_1 = (B, BB, L_{1}, C_{1})$$ and $$H_2=(B, BB, L_{1}, C_{1}, A_{1}, L_{2}, C_{2})$$ are available using get_history() and full_history = TRUE:

get_history(pd, stage = 1, full_history = TRUE)$H |> head() #> Key: <id, stage> #> id stage L_1 C_1 B BB #> <int> <num> <num> <num> <num> <char> #> 1: 1 1 0.9696772 1.7112790 -0.6264538 group2 #> 2: 2 1 -2.1994065 -2.6431237 0.1836433 group1 #> 3: 3 1 1.9480938 2.0619342 -0.8356286 group2 #> 4: 4 1 0.1798532 1.0066957 1.5952808 group2 #> 5: 5 1 0.4150568 0.1538534 0.3295078 group2 #> 6: 6 1 0.6468405 -0.0982121 -0.8204684 group3 get_history(pd, stage = 2, full_history = TRUE)$H |> head()
#> Key: <id, stage>
#>       id stage    A_1        L_1        L_2        C_1        C_2          B
#>    <int> <num> <char>      <num>      <num>      <num>      <num>      <num>
#> 1:     1     2    yes  0.9696772 -0.7393434  1.7112790  2.4243702 -0.6264538
#> 2:     2     2     no -2.1994065  0.4828756 -2.6431237 -2.6647281  0.1836433
#> 3:     3     2     no  1.9480938  0.4803055  2.0619342  2.4747615 -0.8356286
#> 4:     4     2    yes  0.1798532 -0.3574497  1.0066957  2.0571959  1.5952808
#> 5:     5     2     no  0.4150568  2.0473541  0.1538534 -0.9649004  0.3295078
#> 6:     6     2    yes  0.6468405 -2.3701135 -0.0982121  1.0989523 -0.8204684
#>        BB
#>    <char>
#> 1: group2
#> 2: group1
#> 3: group2
#> 4: group2
#> 5: group2
#> 6: group3

Similarly, we access the associated actions at each stage via list element A:

get_history(pd, stage = 1, full_history = TRUE)$A |> head() #> Key: <id, stage> #> id stage A_1 #> <int> <num> <char> #> 1: 1 1 yes #> 2: 2 1 no #> 3: 3 1 no #> 4: 4 1 yes #> 5: 5 1 no #> 6: 6 1 yes get_history(pd, stage = 2, full_history = TRUE)$A |> head()
#> Key: <id, stage>
#>       id stage     A_2
#>    <int> <num>  <char>
#> 1:     1     2      no
#> 2:     2     2      no
#> 3:     3     2 default
#> 4:     4     2     yes
#> 5:     5     2     yes
#> 6:     6     2      no

Alternatively, the state/Markov type history and actions are available using full_history = FALSE:

get_history(pd, full_history = FALSE)$H |> head() #> Key: <id, stage> #> id stage L C B BB #> <int> <int> <num> <num> <num> <char> #> 1: 1 1 0.9696772 1.711279 -0.6264538 group2 #> 2: 1 2 -0.7393434 2.424370 -0.6264538 group2 #> 3: 2 1 -2.1994065 -2.643124 0.1836433 group1 #> 4: 2 2 0.4828756 -2.664728 0.1836433 group1 #> 5: 3 1 1.9480938 2.061934 -0.8356286 group2 #> 6: 3 2 0.4803055 2.474761 -0.8356286 group2 get_history(pd, full_history = FALSE)$A |> head()
#> Key: <id, stage>
#>       id stage       A
#>    <int> <int>  <char>
#> 1:     1     1     yes
#> 2:     1     2      no
#> 3:     2     1      no
#> 4:     2     2      no
#> 5:     3     1      no
#> 6:     3     2 default

Note that policy_data() overrides the action variable names to A_1, A_2, … in the full history case and A in the state/Markov history case.

As in the single-stage case we access the utility, i.e. the sum of the rewards, using get_utility():

#> Key: <id>
#>       id         U
#>    <int>     <num>
#> 1:     1  1.110369
#> 2:     2 -1.788041
#> 3:     3  2.836251
#> 4:     4  3.173743
#> 5:     5  1.891312
#> 6:     6 -1.120837

# Multi-stage: long data

In this example we illustrate how polle handles decision processes with a stochastic number of stages, see Section 3.5 in (Nordland and Holst 2023). The data is simulated using sim_multi_stage(). Detailed information on the simulation is available in ?sim_multi_stage. We simulate data from 2000 iid subjects:

d <- sim_multi_stage(2e3, seed = 1)

As described, the stage data is in long format:

d$stage_data[, -(9:10)] |> head() #> id stage event t A X X_lead U #> <num> <num> <num> <num> <char> <num> <num> <num> #> 1: 1 1 0 0.000000 1 1.3297993 0.0000000 0.0000000 #> 2: 1 2 0 1.686561 1 -0.7926711 1.3297993 0.3567621 #> 3: 1 3 0 3.071768 0 3.5246509 -0.7926711 2.1778778 #> 4: 1 4 1 3.071768 <NA> NA NA 0.0000000 #> 5: 2 1 0 0.000000 1 0.7635935 0.0000000 0.0000000 #> 6: 2 2 0 1.297336 1 -0.5441694 0.7635935 0.5337427 The id variable is important for identifying which rows belong to each subjects. The baseline data uses the same id variable: d$baseline_data |> head()
#>       id     B
#>    <num> <int>
#> 1:     1     0
#> 2:     2     0
#> 3:     3     1
#> 4:     4     1
#> 5:     5     1
#> 6:     6     0

The data is transformed using policy_data() with type = "long". The names of the id, stage, event, action, and utility variables must be specified. The event variable, inspired by the event variable in survival::Surv(), is 0 whenever an action occur and 1 for a terminal event.

pd <- policy_data(data = d$stage_data, baseline_data = d$baseline_data,
type = "long",
id = "id",
stage = "stage",
event = "event",
action = "A",
utility = "U")
pd
#> Policy data with n = 2000 observations and maximal K = 4 stages.
#>
#>      action
#> stage    0    1    n
#>     1  113 1887 2000
#>     2  844 1039 1883
#>     3  956   74 1030
#>     4   72    0   72
#>
#> Baseline covariates: B
#> State covariates: t, X, X_lead
#> Average utility: 2.46

In some cases we are only interested in analyzing a subset of the decision stages. partial() trims the maximum number of decision stages:

pd3 <- partial(pd, K = 3)
pd3
#> Policy data with n = 2000 observations and maximal K = 3 stages.
#>
#>      action
#> stage    0    1    n
#>     1  113 1887 2000
#>     2  844 1039 1883
#>     3  956   74 1030
#>
#> Baseline covariates: B
#> State covariates: t, X, X_lead
#> Average utility: 2.46

# SessionInfo

sessionInfo()
#> R version 4.3.2 (2023-10-31)
#> Platform: aarch64-apple-darwin22.6.0 (64-bit)
#> Running under: macOS Sonoma 14.4.1
#>
#> Matrix products: default
#> BLAS:   /Users/oano/.asdf/installs/R/4.3.2/lib/R/lib/libRblas.dylib
#> LAPACK: /Users/oano/.asdf/installs/R/4.3.2/lib/R/lib/libRlapack.dylib;  LAPACK version 3.11.0
#>
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#>
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#>
#> attached base packages:
#> [1] splines   stats     graphics  grDevices utils     datasets  methods
#> [8] base
#>
#> other attached packages:
#> [1] polle_1.4           SuperLearner_2.0-29 gam_1.22-3
#> [4] foreach_1.5.2       nnls_1.5
#>
#> loaded via a namespace (and not attached):
#>  [1] progressr_0.14.0    cli_3.6.2           knitr_1.45
#>  [4] rlang_1.1.3         xfun_0.41           jsonlite_1.8.8
#>  [7] data.table_1.15.4   listenv_0.9.1       future.apply_1.11.2
#> [10] lava_1.8.0          htmltools_0.5.7     sass_0.4.7
#> [13] rmarkdown_2.25      grid_4.3.2          evaluate_0.23
#> [16] jquerylib_0.1.4     fastmap_1.1.1       yaml_2.3.7
#> [19] compiler_4.3.2      codetools_0.2-19    future_1.33.2
#> [22] lattice_0.21-9      digest_0.6.35       R6_2.5.1
#> [25] parallelly_1.37.1   parallel_4.3.2      Matrix_1.6-1.1
#> [28] bslib_0.5.1         tools_4.3.2         iterators_1.0.14
#> [31] globals_0.16.3      survival_3.5-7      cachem_1.0.8

# References

Nordland, Andreas, and Klaus K. Holst. 2023. “Policy Learning with the Polle Package.” https://doi.org/10.48550/arXiv.2212.02335.