Example: Hidden Markov Model

This vignette1 describes the workflow of the {ino} package for the likelihood optimization of an hidden Markov model (HMM). For more technical details about HMMs see, e.g., Zucchini et al. (2016).

Data

The example data set considered throughout this vignette covers a time series of share returns from the stock ‘Deutsche Bank’. The data set is freely accessible via Yahoo Finance.

library("fHMM")
file <- tempfile()
fHMM::download_data(symbol = "DBK.DE", from = "2000-01-01", file = file)
#> Download successful.
#> * symbol: DBK.DE 
#> * from: 2000-01-03 
#> * to: 2022-09-28 
#> * path: C:\Users\loelschlaeger\AppData\Local\Temp\RtmpcpEkAy\filee6c56577e4f
library("dplyr")
db_data <- read.csv(file) %>%
  as_tibble() %>%
  summarize(date = as.Date(Date, format = "%Y-%m-%d"),
            obs = c(NA, diff(log(Close), lag=1) * 100)) %>%
  filter(!is.na(obs)) %>%
  print()
#> # A tibble: 5,815 × 2
#>    date          obs
#>    <date>      <dbl>
#>  1 2000-01-04 -5.16 
#>  2 2000-01-05 -1.30 
#>  3 2000-01-06  4.97 
#>  4 2000-01-07  2.99 
#>  5 2000-01-10 -2.61 
#>  6 2000-01-11  0.350
#>  7 2000-01-12 -3.05 
#>  8 2000-01-13  3.59 
#>  9 2000-01-14  0.656
#> 10 2000-01-17  2.62 
#> # … with 5,805 more rows
library("ggplot2")
ggplot(db_data, aes(x = date, y = obs)) +
  geom_point() +
  geom_line() +
  ylab("log-returns [%]")

As the share returns are continuous and can take both negative and positive values, we consider an HMM with Gaussian state-dependent distributions. The (log-)likelihood of a Gaussian-HMM is implemented in the function f_ll_hmm().

Model fitting

We consider an 2-state (N = 2) Gaussian-HMM with six parameters (npar = 6) to be estimated:

The argument neg = TRUE indicates that we minimize the negative log-likelihood, and opt selects the optimizer nlm().

hmm_ino <- setup_ino(
  f = f_ll_hmm,
  npar = 6,
  data = db_data,
  N = 2,
  neg = TRUE,
  opt = set_optimizer_nlm()
)

Random initialization

We first use randomly chosen starting values. As the first two starting values belong to the tpm, we sample from a normal distribution with mean -1.5 and standard deviation of 0.5 — as we use the multinomial logit link to ensure that the probabilities are between 0 and 1, a mean of -1.5 correspond to probabilities of staying in state 1 and 2 of about 0.81. For the two means, we draw two random numbers from the standard normal distribution, as the time series above indicates that the most of the returns are fairly close to zero. The starting values for the standard deviations are drawn from a uniform distribution between 0.5 and 2 (note that we exponentiate the standard deviations in the likelihood as they are constrained to be positive, and hence we log transform the starting values).

sampler <- function() c(log(stats::runif(2, 0.1, 0.9)),
                        stats::rnorm(2),
                        log(stats::runif(2, 0.5, 2)))
hmm_ino <- random_initialization(hmm_ino, runs = 50, sampler = sampler)

Fixed initialization

For selecting fixed starting values, we consider values that lie in the ranges considered above:

starting_values <- list(c(-2, -2, 0, 0, log(2), log(3)), 
                        c(-1.5, -1.5, -2, 2, log(1), log(2)),
                        c(-1, -1, -3, 3, log(2), log(2)))
for(val in starting_values)
  hmm_ino <- fixed_initialization(hmm_ino, at = val)

Subset initialization

To illustrate the subset initialization strategy, we fit our HMM to the first 50% of the observation. The starting values for these subsets are chosen randomly from the same distributions as considered above. The function subset_initialization() then fits the HMM again to the entire sample using the estimates obtained from the subsets as initial values.

hmm_ino <- subset_initialization(
  hmm_ino, arg = "data", how = "first", prop = 0.5,
  initialization =  random_initialization(runs = 50, sampler = sampler)
)

As the time series of share returns consists of 5815 observations, we can even try to use fewer observations to fit our model. In the next step, we consider only the first 5% of the observations.

hmm_ino <- subset_initialization(
  hmm_ino, arg = "data", how = "first", prop = 0.05,
  initialization =  random_initialization(runs = 50, sampler = sampler)
)

Evaluating the optimization runs

Local optima

Selecting the starting values for HMMs is a well-known issue, as poor starting values may likely result in local maxima. Other R packages designed to fit HMMs discuss this topic in more detail (see, e.g., https://cran.r-project.org/package=moveHMM). We thus first evaluate the optimizations by comparing the likelihood values at convergence, which can be displayed using overview_optima():

overview_optima(hmm_ino)
#>    optimum frequency
#> 1 12877.85       142
#> 2 13774.87        10
#> 3 13454.33         1

The frequency table indicates that 142 out of 153 runs converged to the same likelihood value (1.287785^{4}), which appears to be the global optimum (note these are the negative log-likelihood values).

Using summary, we can investigate the computation time and the resulting optimum for all runs:

summary(hmm_ino)
#> # A tibble: 153 × 4
#>    .strategy .time          .optimum .optimizer
#>    <chr>     <drtn>            <dbl> <chr>     
#>  1 random    11.772112 secs   12878. stats::nlm
#>  2 random     9.919681 secs   12878. stats::nlm
#>  3 random    10.021004 secs   12878. stats::nlm
#>  4 random     9.735524 secs   12878. stats::nlm
#>  5 random    19.883488 secs   12878. stats::nlm
#>  6 random    11.398910 secs   12878. stats::nlm
#>  7 random    25.223363 secs   12878. stats::nlm
#>  8 random    20.918431 secs   12878. stats::nlm
#>  9 random    34.802511 secs   12878. stats::nlm
#> 10 random     8.948022 secs   13775. stats::nlm
#> # … with 143 more rows

To obtain the starting values that lead to the global optimum, we first check which of the optimization runs actually reached the global optimum:

which(summary(hmm_ino)$.optimum < 12878)
#>   [1]   1   2   3   4   5   6   7   8   9  11  12  13  14  16  17  19  20  21
#>  [19]  22  23  24  25  26  27  28  29  30  31  32  34  35  36  37  38  39  40
#>  [37]  41  42  43  44  45  46  47  48  49  51  52  53  54  55  56  57  59  60
#>  [55]  61  62  63  64  65  66  67  68  70  71  72  73  74  75  76  77  78  79
#>  [73]  80  81  82  83  84  85  86  87  88  89  91  92  94  95  96  97  98 100
#>  [91] 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118
#> [109] 119 120 121 122 123 124 125 126 127 129 130 131 132 133 134 135 136 137
#> [127] 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153

We can access the corresponding initial values of, for example, the first of these runs via:

get_vars(hmm_ino, runs = which(summary(hmm_ino)$.optimum < 12878)[1])[[1]]$.init
#> [1] -1.2160764 -1.5387768 -0.7074952  0.3645820  0.5118483  0.5282509

Optimization time

We use the plot() function to investigate the computation time. Intuitively, optimization runs with fixed starting values should be faster than with random starting values, since we have carefully chosen the fixed starting values by investigating the data. The boxplots confirm this intuition:

plot(hmm_ino, by = ".strategy", nrow = 1)

Using the output provided by summary() and some data manipulation functions provided by the package {dplyr}, we can also compute the average time per strategy:

summary(hmm_ino) %>% 
  group_by(.strategy) %>% 
  summarise(avg_time = mean(.time))
#> # A tibble: 4 × 2
#>   .strategy          avg_time      
#>   <chr>              <drtn>        
#> 1 fixed               8.693801 secs
#> 2 random             14.082786 secs
#> 3 subset(first,0.05)  5.418156 secs
#> 4 subset(first,0.5)   8.008285 secs

This comparison can however be considered somehow ‘unfair’, as not all optimization runs lead to the global optimum — optimization runs that lead to local optima may then have less iterations and hence lower computation time. We can first check which runs lead to the apparent global optimum of 1.287785^{4}. For that, we will again use functions from {dplyr}.

summary(hmm_ino) %>% 
  mutate(global_optimum = ifelse(.optimum < 12878, 1, 0)) %>% 
  group_by(.strategy) %>% 
  summarise(proportion_global_optimum = mean(global_optimum))
#> # A tibble: 4 × 2
#>   .strategy          proportion_global_optimum
#>   <chr>                                  <dbl>
#> 1 fixed                                   1   
#> 2 random                                  0.9 
#> 3 subset(first,0.05)                      0.98
#> 4 subset(first,0.5)                       0.9

While for the fixed starting values all runs converge to the global optimum, the random initialization and subset initialization strategies do sometimes get stuck in local maxima.

Let’s again compare the average computation time, but only for those runs that lead to the global optimum:

summary(hmm_ino) %>% 
  filter(.optimum < 12878) %>% 
  group_by(.strategy) %>% 
  summarise(mean_time = mean(.time))
#> # A tibble: 4 × 2
#>   .strategy          mean_time     
#>   <chr>              <drtn>        
#> 1 fixed               8.693801 secs
#> 2 random             14.833138 secs
#> 3 subset(first,0.05)  5.488308 secs
#> 4 subset(first,0.5)   8.375048 secs

For those runs that converged to the global optimum, we can again compare the computation time via boxplots:

summary(hmm_ino) %>% 
  filter(.optimum < 12878) %>% 
  ggplot(aes(x = "", y = .time)) +
    scale_y_continuous() +
    geom_boxplot() +
    facet_wrap(".strategy", labeller = "label_both", nrow = 1) +
    theme(
      axis.title.x = element_blank(),
      axis.text.x = element_blank(),
      axis.ticks.x = element_blank()
    ) +
    ylab("optimization time")

While computation time is lowest for the fixed initialization strategy, subset initialization appears to be a promising approach. In particular, when considering the first 5% of the data in a first step, the computation time is most likely to be lower compared to the random initialization approach.


  1. The vignette was build using R 4.2.1 with the {ino} 0.2.0 package.↩︎