`library(gips)`

` ?find_MAP`

The goal of the `find_MAP()`

is to find the permutation
\(\sigma\) that maximizes the a
posteriori probability (MAP - Maximum A Posteriori). Such a permutation
represents the most plausible symmetry given the data.

This a posteriori probability function is described more in-depth in
the **Bayesian model selection** section of the
`vignette("Theory")`

, also available as a pkgdown
page. `gips`

can calculate the logarithm of it by
`log_posteriori_of_gips()`

function. In the following
paragraphs, we will refer to this a posteriori probability function as
\(f(\sigma)\).

The space of permutations is enormous - for the permutation of size
\(p\), the space of all permutations is
of size \(p!\) (\(p\) factorial). Even for \(p=19\), this space is practically
impossible to browse. This is why `find_MAP()`

implements
multiple optimizers to choose from:

`"Metropolis_Hastings"`

,`"MH"`

`"hill_climbing"`

,`"HC"`

`"brute_force"`

,`"BF"`

,`"full"`

This optimizer is implementation of the *Second approach* from
[1, Sec 4.1.2].

This uses the Metropolis-Hastings algorithm to optimize the space; see Wikipedia. This algorithm used in this context is a special case of the Simulated Annealing the reader may be more familiar with; see Wikipedia.

In every iteration \(i\), an algorithm is in a permutation, say, \(\sigma_i\). Then a random transposition is drawn uniformly \(t_i = (j,k)\) and the value of \(f(\sigma_i \circ t_i)\) is computed.

- If new value is bigger than the previous one, (i.e. \(f(\sigma_i \circ t_i) \ge f(\sigma_i)\)), then we set \(\sigma_{i+1} = \sigma_i \circ t_i\).
- If new value is smaller (\(f(\sigma_i \circ t_i) < f(\sigma_i)\)), then we will choose \(\sigma_{i+1} = \sigma_i \circ t_i\) with probability \(\frac{f(\sigma_i \circ t_i)}{f(\sigma_i)}\). Otherwise, we set \(\sigma_{i+1} = \sigma_i\) with complementary probability \(1 - \frac{f(\sigma_i \circ t_i)}{f(\sigma_i)}\).

The final value is the best \(\sigma\) that was ever computed.

This algorithm was tested in multiple settings and turned out to be an outstanding optimizer. Especially given it does not need any hyperparameters tuned.

The only parameter it depends on is `max_iter`

, which
determines the number of steps described above. One should choose this
number rationally. When decided too small, there is a missed opportunity
to find potentially a much better permutation. When decided too big,
there is a lost time and computational power that does not lead to
growth. Our recommendation is to plot the convergence plot with a
logarithmic OX scale:
`plot(g_map, type = "both", logarithmic_x = TRUE)`

. Then
decide if the line has flattened already. Keep in mind that the OY scale
is also logarithmic. For example, a small change on the OY scale could
mean \(10000\) **times**
the change in A Posteriori.

This algorithm has been analyzed extensively by statisticians. Thanks
to the ergodic theorem, the frequency of visits of a given state
converges almost surely to the probability of that state. This is the
approach explained in the [1,
Sec.4.1.2] and shown in [1, Sec. 5.2]. One can
obtain estimates of posterior probabilities by setting
`return_probabilities = TRUE`

.

Let’s say we have the data `Z`

from the unknown
process:

```
dim(Z)
#> [1] 50 70
<- nrow(Z) # 50
number_of_observations <- ncol(Z) # 70
perm_size <- cov(Z) # Assume we have to estimate the mean
S
<- gips(S, number_of_observations)
g suppressMessages( # message from ggplot2
plot(g, type = "heatmap") +
::scale_x_continuous(breaks = c(1,10,20,30,40,50,60,70)) +
ggplot2::scale_y_reverse(breaks = c(1,10,20,30,40,50,60,70))
ggplot2 )
```

```
<- find_MAP(g, max_iter = 10, optimizer = "Metropolis_Hastings")
g_map #> ========================================================================
#> Warning: The found permutation has n0 = 67 which is bigger than the number_of_observations = 50.
#> ℹ The covariance matrix invariant under the found permutation does not have the likelihood properly defined.
#> ℹ For more in-depth explanation, see 'Project Matrix - Equation (6)' section in `vignette('Theory', package = 'gips')` or its pkgdown page: https://przechoj.github.io/gips/articles/Theory.html.
g_map#> The permutation (5,25)(9,45,60)(37,70)
#> - was found after 10 log_posteriori calculations
#> - is 15208559032166170453082112 times more likely than the starting, () permutation.
```

Only after ten iterations the found permutation is unimaginably more
likely than the original, `()`

permutation.

`plot(g_map, type = "both", logarithmic_x = TRUE)`

It uses the Hill climbing algorithm to optimize the space; see Wikipedia.

It is performing the local optimization iteratively.

In every iteration \(i\), an algorithm is in a permutation; call it \(\sigma_i\). Then all the values of \(f(\sigma_i \circ t)\) are computed for every possible transposition \(t = (j,k)\). Then the next \(\sigma_{i+1}\) will be the one with the biggest value:

\[\sigma_{i+1} = argmax_{\text{perm} \in \text{neighbors}(\sigma_{i})}\{\text{posteriori}(perm)\}\]

Where: \[\text{neighbors}(\sigma) = \{\sigma \circ (j,k) : 1 \le j < k \le \text{p}\}\]

The algorithm ends when all neighbors are less likely, or the
`max_iter`

was achieved. In the first case, the algorithm
will end in a local maximum, but there is no guarantee that this is also
the global maximum.

The `max_iter`

parameter works differently for this
optimizer and Metropolis-Hastings. The Metropolis-Hastings will compute
a posteriori of `max_iter`

permutations. The Brute Force
optimizer, \({p\choose 2} \cdot\)
`max_iter`

.

Let’s say we have the data Z from the unknown process:

```
dim(Z)
#> [1] 20 25
<- nrow(Z) # 20
number_of_observations <- ncol(Z) # 25
perm_size <- cov(Z) # Assume we have to estimate the mean
S
<- gips(S, number_of_observations)
g plot(g, type = "heatmap")
```

```
<- find_MAP(g, max_iter = 2, optimizer = "hill_climbing")
g_map #> ================================================================================
#> Warning: Hill Climbing algorithm did not converge in 2 iterations!
#> ℹ We recommend to run the `find_MAP(optimizer = 'continue')` on the acquired output.
#> Warning: The found permutation has n0 = 24 which is bigger than the number_of_observations = 20.
#> ℹ The covariance matrix invariant under the found permutation does not have the likelihood properly defined.
#> ℹ For more in-depth explanation, see 'Project Matrix - Equation (6)' section in `vignette('Theory', package = 'gips')` or its pkgdown page: https://przechoj.github.io/gips/articles/Theory.html.
g_map#> The permutation (2,16)(14,20)
#> - was found after 601 log_posteriori calculations
#> - is 4414155601613037260524331466752 times more likely than the starting, () permutation.
plot(g_map, type = "both")
```

It searches through the whole space at once.

This is the only optimizer that will certainly find the actual MAP Estimator.

This is **only recommended** for small spaces (\(p \le 8\)). It can also browse bigger
spaces, but the required time is probably too long.

Let’s say we have the data Z from the unknown process:

```
dim(Z)
#> [1] 13 6
<- nrow(Z) # 13
number_of_observations <- ncol(Z) # 6
perm_size <- cov(Z) # Assume we have to estimate the mean
S
<- gips(S, number_of_observations)
g
<- find_MAP(g, optimizer = "brute_force")
g_map #> ================================================================================
g_map#> The permutation (1,2,3,4,5,6)
#> - was found after 720 log_posteriori calculations
#> - is 80063.152438581 times more likely than the starting, () permutation.
```

The `find_MAP()`

function also has two additional
parameters, namely `show_progress_bar`

and
`save_all_perms`

. Both can be set to `TRUE`

or
`FALSE`

.

The `show_progress_bar = TRUE`

means `gips`

will print the “=” characters on the console as the optimization run.
Keep in mind that when one sets the
`return_probabilities = TRUE`

, there will be a second
progress bar indicating calculating the probabilities after
optimization.

The `save_all_perms = TRUE`

means that `gips`

will save all visited permutations in the outputted object. This will
significantly increase the RAM needed for this object (for example, for
\(p=150\) and
`max_perm = 150000`

, we needed 400 MB to store it, while with
`save_all_perms = FALSE`

, the 2 MB was enough). However, this
is necessary for `return_probabilities = TRUE`

or more
complex path analysis.

We are considering implementing the **First approach**
from [1] in the future as well. In this approach, the Markov chain
travels along cyclic groups rather than permutations.

We encourage everyone to leave a comment on available and potential
new optimizers on the ISSUE#21. There,
one can also find the implemented optimizers but not yet added to
`gips`

.

[1] Piotr Graczyk, Hideyuki Ishi, Bartosz Kołodziejek, Hélène Massam. “Model selection in the space of Gaussian models invariant by symmetry.” The Annals of Statistics, 50(3) 1747-1774 June 2022. arXiv link; DOI: 10.1214/22-AOS2174