A quick tour of mclustAddons

Luca Scrucca

20 Sep 2024

Introduction

mclustAddons is a contributed R package that extends the functionality available in the mclust package (Scrucca et al. 2016, Scrucca et al. 2023).
In particular, the following methods are included:

This document gives a quick tour of mclustAddons (version 0.9). It was written in R Markdown, using the knitr package for production.

References on the methodologies implemented are provided at the end of this document.

library(mclustAddons)
## Loading required package: mclust
## Package 'mclust' version 6.1.2
## Type 'citation("mclust")' for citing this R package in publications.
## Loaded package 'mclustAddons' version 0.9

Density estimation for data with bounded support

Univariate case with lower bound

x <- rchisq(200, 3)
xgrid <- seq(-2, max(x), length=1000)
f <- dchisq(xgrid, 3)  # true density
dens <- densityMclustBounded(x, lbound = 0)
summary(dens, parameters = TRUE)
## ── Density estimation for bounded data via GMMs ─────────── 
##            
## Boundaries:   x
##       lower   0
##       upper Inf
## 
## Model E (univariate, equal variance) model with 1 component
## on the transformation scale:
## 
##  log-likelihood   n df       BIC       ICL
##       -390.0517 200  3 -795.9983 -795.9983
## 
##                                     x
## Range-power transformation: 0.3715163
## 
## Mixing probabilities:
## 1 
## 1 
## 
## Means:
##         1 
## 0.9191207 
## 
## Variances:
##        1 
## 1.309037
plot(dens, what = "density")
lines(xgrid, f, lty = 2)

plot(dens, what = "density", data = x, breaks = 15)

Univariate case with lower & upper bounds

x <- rbeta(200, 5, 1.5)
xgrid <- seq(-0.1, 1.1, length=1000)
f <- dbeta(xgrid, 5, 1.5)  # true density
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
summary(dens, parameters = TRUE)
## ── Density estimation for bounded data via GMMs ─────────── 
##            
## Boundaries: x
##       lower 0
##       upper 1
## 
## Model E (univariate, equal variance) model with 1 component
## on the transformation scale:
## 
##  log-likelihood   n df      BIC      ICL
##        120.4011 200  3 224.9072 224.9072
## 
##                                      x
## Range-power transformation: -0.2200992
## 
## Mixing probabilities:
## 1 
## 1 
## 
## Means:
##        1 
## 1.152107 
## 
## Variances:
##         1 
## 0.5212129
plot(dens, what = "density")

plot(dens, what = "density", data = x, breaks = 11)

Bivariate case with lower bounds

x1 <- rchisq(200, 3)
x2 <- 0.5*x1 + sqrt(1-0.5^2)*rchisq(200, 5)
x <- cbind(x1, x2)
dens <- densityMclustBounded(x, lbound = c(0,0))
summary(dens, parameters = TRUE)
## ── Density estimation for bounded data via GMMs ─────────── 
##            
## Boundaries:  x1  x2
##       lower   0   0
##       upper Inf Inf
## 
## Model VEE (ellipsoidal, equal shape and orientation) model with 1 component
## on the transformation scale:
## 
##  log-likelihood   n df      BIC      ICL
##       -889.9509 200  7 -1816.99 -1816.99
## 
##                                    x1        x2
## Range-power transformation: 0.2957409 0.3172159
## 
## Mixing probabilities:
## 1 
## 1 
## 
## Means:
##        [,1]
## x1 1.041532
## x2 2.209400
## 
## Variances:
## [,,1]
##           x1        x2
## x1 1.2671762 0.3912456
## x2 0.3912456 0.8249777
plot(dens, what = "BIC")

plot(dens, what = "density")
points(x, cex = 0.3)
abline(h = 0, v = 0, lty = 3)

plot(dens, what = "density", type = "hdr")
abline(h = 0, v = 0, lty = 3)

plot(dens, what = "density", type = "persp")

Suicide data

The data consist in the lengths of 86 spells of psychiatric treatment undergone by control patients in a suicide study (Silverman, 1986).

data("suicide")
dens <- densityMclustBounded(suicide, lbound = 0)
summary(dens, parameters = TRUE)
## ── Density estimation for bounded data via GMMs ─────────── 
##            
## Boundaries: suicide
##       lower       0
##       upper     Inf
## 
## Model E (univariate, equal variance) model with 1 component
## on the transformation scale:
## 
##  log-likelihood  n df       BIC       ICL
##       -497.8204 86  3 -1009.004 -1009.004
## 
##                               suicide
## Range-power transformation: 0.1929267
## 
## Mixing probabilities:
## 1 
## 1 
## 
## Means:
##        1 
## 6.700073 
## 
## Variances:
##        1 
## 7.788326
plot(dens, what = "density", 
     lwd = 2, col = "dodgerblue2",
     data = suicide, breaks = 15, 
     xlab = "Length of psychiatric treatment")
rug(suicide)

Racial data

This dataset provides the proportion of white student enrollment in 56 school districts in Nassau County (Long Island, New York), for the 1992-1993 school year (Simonoff 1996, Sec. 3.2).

data("racial")
x <- racial$PropWhite
dens <- densityMclustBounded(x, lbound = 0, ubound = 1)
summary(dens, parameters = TRUE)
## ── Density estimation for bounded data via GMMs ─────────── 
##            
## Boundaries: x
##       lower 0
##       upper 1
## 
## Model E (univariate, equal variance) model with 1 component
## on the transformation scale:
## 
##  log-likelihood  n df      BIC      ICL
##         42.4598 56  3 72.84355 72.84355
## 
##                                     x
## Range-power transformation: 0.3869476
## 
## Mixing probabilities:
## 1 
## 1 
## 
## Means:
##        1 
## 2.795429 
## 
## Variances:
##        1 
## 5.253254
plot(dens, what = "density", 
     lwd = 2, col = "dodgerblue2",
     data = x, breaks = 15, 
     xlab = "Proportion of white student enrolled in schools")
rug(x)