title: “qgam: quantile non-parametric additive models” author: “Matteo Fasiolo, Simon N. Wood, Yannig Goude, and Raphael Nedellec” date: “2019-01-24” vignette: > %\VignetteEngine{knitr::rmarkdown} %\VignetteIndexEntry{qgam_vignette}

%\VignetteEncoding{UTF-8}

This R package offers methods for fitting additive quantile regression models based on splines, using the methods described in Fasiolo et al., 2017.

The main fitting functions are:

A first example: smoothing the motorcycle dataset

Let's start with a simple example. Here we are fitting a regression model with an adaptive spline basis to quantile 0.8 of the motorcycle dataset.

library(qgam); library(MASS)
if( suppressWarnings(require(RhpcBLASctl)) ){ blas_set_num_threads(1) } # Optional

set.seed(6436)
fit <- qgam(accel~s(times, k=20, bs="ad"), 
            data = mcycle, 
            qu = 0.8, 
            err = 0.1)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.8..........done
# Plot the fit
xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3)))
pred <- predict(fit, newdata = xSeq, se=TRUE)
plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80))
lines(xSeq$times, pred$fit, lwd = 1)
lines(xSeq$times, pred$fit + 2*pred$se.fit, lwd = 1, col = 2)
lines(xSeq$times, pred$fit - 2*pred$se.fit, lwd = 1, col = 2)   

plot of chunk 1 qgam automatically calls tuneLearnFast to select the learning rate. The results of the calibrations are stored in fit$calibr. We can check whether the optimization succeded as follows:

check(fit$calibr, 2)

plot of chunk 2 The plot suggest that the calibration criterion has a single minimum, and that the optimizer has converged to its neighbourhood. Alternatively, we could have selected the learning rate by evaluating the loss function on a grid.

set.seed(6436)
cal <- tuneLearn(accel~s(times, k=20, bs="ad"), 
                 data = mcycle, 
                 qu = 0.8,
                 err = 0.1,
                 lsig = seq(1, 3, length.out = 20), 
                 control = list("progress" = "none")) #<- sequence of values for learning rate

check(cal)

plot of chunk 3plot of chunk 3 Here the generic check function produces a different output. The first plot is the calibration criterion as a function of \(log(\sigma)\), which should look fairly smooth. The second plot shows how the effective degrees of freedom (EDF) vary with \(log(\sigma)\). Notice that here we are using an adaptive smoother, which includes five smoothing parameters.

We might want to fit several quantiles at once. This can be done with mqgam.

quSeq <- c(0.2, 0.4, 0.6, 0.8)
set.seed(6436)
fit <- mqgam(accel~s(times, k=20, bs="ad"), 
             data = mcycle, 
             err = 0.1,
             qu = quSeq)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.4..............done 
## qu = 0.6.........done 
## qu = 0.2.........done 
## qu = 0.8...............done

To save memory mqgam does not return one mgcv::gamObject for each quantile, but it avoids storing some redundant data (such as several copies of the design matrix). The output of mqgam can be manipulated using the qdo function.

# Plot the data
xSeq <- data.frame(cbind("accel" = rep(0, 1e3), "times" = seq(2, 58, length.out = 1e3)))
plot(mcycle$times, mcycle$accel, xlab = "Times", ylab = "Acceleration", ylim = c(-150, 80))

# Predict each quantile curve and plot
for(iq in quSeq){
  pred <- qdo(fit, iq, predict, newdata = xSeq)
  lines(xSeq$times, pred, col = 2)
}

plot of chunk 5

Using qdo we can print out the summary for each quantile, for instance:

# Summary for quantile 0.4
qdo(fit, qu = 0.4, summary)
## 
## Family: elf 
## Link function: identity 
## 
## Formula:
## accel ~ s(times, k = 20, bs = "ad")
## 
## Parametric coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)  -30.140      1.868  -16.14   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Approximate significance of smooth terms:
##            edf Ref.df Chi.sq p-value    
## s(times) 9.144  10.51  820.5  <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## R-sq.(adj) =   0.78   Deviance explained = 67.2%
## -REML = 609.48  Scale est. = 1         n = 133

Notice that here the generic function summary is calling summary.gam, because summary.qgam has not been implemented yet. Hence one cannot quite rely on the p-value provided by this function, because their are calculated using result that apply to parametric, not quantile, regression.

Dealing with heteroscedasticity

Let us simulate some data from an heteroscedastic model.

set.seed(651)
n <- 5000
x <- seq(-4, 3, length.out = n)
X <- cbind(1, x, x^2)
beta <- c(0, 1, 1)
sigma =  1.2 + sin(2*x)
f <- drop(X %*% beta)
dat <- f + rnorm(n, 0, sigma)
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")

qus <- seq(0.05, 0.95, length.out = 10)
plot(x, dat, col = "grey", ylab = "y")
for(iq in qus){ lines(x, qnorm(iq, f, sigma)) }

plot of chunk h1

We now fit ten quantiles between 0.05 and 0.95, using a quantile GAM with scalar learning rate.

fit <- mqgam(y~s(x, k = 30, bs = "cr"), 
             data = dataf,
             qu = qus, err = 0.05)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.55........done 
## qu = 0.45...........done 
## qu = 0.35........done 
## qu = 0.65..........done 
## qu = 0.25............done 
## qu = 0.75..........done 
## qu = 0.15.........done 
## qu = 0.85..........done 
## qu = 0.95...................done 
## qu = 0.05..............done
qus <- seq(0.05, 0.95, length.out = 10)
plot(x, dat, col = "grey", ylab = "y")
for(iq in qus){ 
 lines(x, qnorm(iq, f, sigma), col = 2)
 lines(x, qdo(fit, iq, predict))
}
legend("top", c("truth", "fitted"), col = 2:1, lty = rep(1, 2))

plot of chunk h2

The fitted quantiles are close to the true ones, but their credible intervals don't vary much with x. Indeed, let's look at intervals for quantile 0.95.

plot(x, dat, col = "grey", ylab = "y")
tmp <- qdo(fit, 0.95, predict, se = TRUE)
lines(x, tmp$fit)
lines(x, tmp$fit + 3 * tmp$se.fit, col = 2)
lines(x, tmp$fit - 3 * tmp$se.fit, col = 2)

plot of chunk h3

We can do better by letting the learning rate vary with the covariate. In particular, we can use an additive model for quantile location and one for the scale or learning rate. Here I am fixing the intercept of the additive model for the learning rate, in order to avoid calibrating it. Just comment out lsig=0.44 if you want to re-estimate it.

fit <- qgam(list(y~s(x, k = 30, bs = "cr"), ~ s(x, k = 30, bs = "cr")), 
            data = dataf, qu = 0.95, err = 0.05, lsig = 0.44)

plot(x, dat, col = "grey", ylab = "y")
tmp <- predict(fit, se = TRUE)
lines(x, tmp$fit[ , 1])
lines(x, tmp$fit[ , 1] + 3 * tmp$se.fit[ , 1], col = 2)
lines(x, tmp$fit[ , 1] - 3 * tmp$se.fit[ , 1], col = 2)

plot of chunk h4

Now the credible intervals correctly represent the underlying uncertainty.

Model checking

The qgam package provides some functions that can be useful for model checking. In particular, we have:

We start by illustrating the cqcheck function. In particular, let us consider the additive model: \[ y \sim x+x^2+z+xz/2+e,\;\;\; e \sim N(0, 1) \] We start by simulating some data from it.

library(qgam)
set.seed(15560)
n <- 1000
x <- rnorm(n, 0, 1); z <- rnorm(n)
X <- cbind(1, x, x^2, z, x*z)
beta <- c(0, 1, 1, 1, 0.5)
y <- drop(X %*% beta) + rnorm(n) 
dataf <- data.frame(cbind(y, x, z))
names(dataf) <- c("y", "x", "z")

We fit a linear model to the median and we use cqcheck produce a diagnostic plot.

qu <- 0.5
fit <- qgam(y~x, qu = qu, data = dataf)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
cqcheck(obj = fit, v = c("x"), X = dataf, y = y) 

plot of chunk c2

The cqcheck function takes a qgam object as input and it predicts the conditional quantile using the data in X. Then it bins the responses y using the corresponding values of v and it calculates, for every bin, what fraction of responses falls below the fitted quantile. Given that we are fitting the median, we would expect that around \(50\%\) of the point falls below the fit. But, as the plot shows, this fraction varies widely along x. There is clearly a non-linear relation between the quantile location and x, hence we add a smooth for x.

fit <- qgam(y~s(x), qu = qu, data = dataf)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done
cqcheck(obj = fit, v = c("x"), X = dataf, y = y)

plot of chunk c3

The deviations from the theoretical quantile (\(0.5\)) are much reduced, but let's look across both x and z.

cqcheck(obj = fit, v = c("x", "z"), X = dataf, y = y, nbin = c(5, 5))

plot of chunk c4

This plot uses binning as before, if a bin is red (green) this means that the fraction of responses falling below the fit is smaller (larger) than 0.5. Bright colours means that the deviation is statistically significant. As we move along z (x2 in the plot) the colour changes from green to red, so it make sense drawing a marginal plot for z:

cqcheck(obj = fit, v = c("z"), X = dataf, y = y, nbin = c(10))

plot of chunk c5

We are clearly missing an effect here. Given that effect looks pretty linear, we simply add a parametric term to the fit, which seems to solve the problem:

fit <- qgam(y~s(x)+z, qu = qu, data = dataf)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.......done
cqcheck(obj = fit, v = c("z"))

plot of chunk c6

But if we look again across both x and z we see that green prevails on the top-left to bottom-right diagonal, while the other diagonal is mainly red.

cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5))

plot of chunk c7

This suggests that adding an interaction between x and z might be a good idea. Indeed, now cqcheck does not signal any problem:

fit <- qgam(y~s(x)+z+I(x*z), qu = qu, data = dataf)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5.........done
cqcheck(obj = fit, v = c("x", "z"), nbin = c(5, 5))

plot of chunk c8

Now that we are fairly satisfied with the model structure, we can, for instance, fit several quantiles by doing:

fit <- mqgam(y~s(x)+z+I(x*z), qu = c(0.2, 0.4, 0.6, 0.8), data = dataf)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.4.........done 
## qu = 0.6................done 
## qu = 0.2........done 
## qu = 0.8........done

We can then check whether the learning rate was selected correctly. Recall that the qgam function calls internally tuneLearnFast, hence we can look at how the calibration went by doing:

check.learnFast(fit$calibr, 2:5)

plot of chunk c10

For each quantile, the calibration loss seems to have a unique minimum, which is what one would hope. Objects of class qgam can also be checked using the generic function check, which defaults to check.qgam. To use this function on the output of mqgam, we must use the qdo function:

qdo(fit, 0.2, check)

plot of chunk c11plot of chunk c11

## Theor. proportion of neg. resid.: 0.2   Actual proportion: 0.204
## Integrated absolute bias |F(mu) - F(mu0)| = 0.009412813
## Method: REML   Optimizer: outer newton
## full convergence after 4 iterations.
## Gradient range [4.826909e-05,4.826909e-05]
## (score 1701.547 & scale 1).
## Hessian positive definite, eigenvalue range [2.549539,2.549539].
## Model rank =  12 / 12 
## 
## Basis dimension (k) check: if edf is close too k' (maximum possible edf) 
## it might be worth increasing k. 
## 
##      k'  edf
## s(x)  9 7.42
## NULL

The printed output gives some information about the optimizer used to estimate the smoothing parameters, for fixed learning rate. See ?check.qgam for more information. The plot has been obtained using cqcheck, where each data point has been binned using the fitted values. On the right side of the plot there seems to be some large deviations, but the rug shows that there are very few data points there.

Setting the loss-smoothing parameter and checking convergence

Let's simulate some data:

set.seed(5235)
n <- 1000
x <- seq(-3, 3, length.out = n)
X <- cbind(1, x, x^2)
beta <- c(0, 1, 1)
f <- drop(X %*% beta)
dat <- f + rgamma(n, 4, 1)
dataf <- data.frame(cbind(dat, x))
names(dataf) <- c("y", "x")

Assume that we want to estimate quantiles 0.05, 0.5 and 0.95:

qus <- c(0.05, 0.5, 0.95)
fit <- mqgam(y ~ s(x), data = dataf, qu = qus, err = 0.05)
## Estimating learning rate. Each dot corresponds to a loss evaluation. 
## qu = 0.5........done 
## qu = 0.95.......done 
## qu = 0.05.................done
plot(x, dat, col = "grey", ylab = "y")
lines(x, f + qgamma(0.95, 4, 1), lty = 2)
lines(x, f + qgamma(0.5, 4, 1), lty = 2)
lines(x, f + qgamma(0.05, 4, 1), lty = 2)
lines(x, qdo(fit, qus[1], predict), col = 2)
lines(x, qdo(fit, qus[2], predict), col = 2)
lines(x, qdo(fit, qus[3], predict), col = 2)

plot of chunk check2

Let's try to use several values of err:

lfit <- lapply(c(0.01, 0.05, 0.1, 0.2, 0.3, 0.5),
               function(.inp){
                 mqgam(y ~ s(x), data = dataf, qu = qus, err = .inp, 
                       control = list("progress" = F))
               })

plot(x, dat, col = "grey", ylab = "y", ylim = c(-2, 20))
colss <- rainbow(length(lfit))
for(ii in 1:length(lfit)){
  lines(x, qdo(lfit[[ii]], qus[1], predict), col = colss[ii])
  lines(x, qdo(lfit[[ii]], qus[2], predict), col = colss[ii])
  lines(x, qdo(lfit[[ii]], qus[3], predict), col = colss[ii])
}
lines(x, f + qgamma(0.95, 4, 1), lty = 2)
lines(x, f + qgamma(0.5, 4, 1), lty = 2)
lines(x, f + qgamma(0.05, 4, 1), lty = 2)