# 1 Introduction

In applied work in order to evaluate the effect of a set of exogenous variables on an outcome it is very common to estimate a parametric model such as the linear model with ordinary least squares (OLS). But such parametric specifications may not capture the true relationship between outcome and exogenous variables. In fact if the chosen parametric model is a bad approximation of the true model then counterfactual analysis will be flawed. For this reason in the past forty years a literature on specification tests has developed in order to know if a parametric specification is right or wrong. SpeTestNP is a package which implements heteroskedasticity-robust specification tests of parametric models from Bierens (1982), Zheng (1996), Escanciano (2006), Lavergne and Patilea (2008), and Lavergne and Patilea (2012).

Hippolyte Boucher (Hippolyte.Boucher@outlook.com) is the author of SpeTestNP and Pascal Lavergne (lavergnetse@gmail.com) is a contributor. Both Hippolyte Boucher and Pascal Lavergne are maintainers and any question or bug should be reported to one of them. This vignette describes the principle behind each test available in SpeTestNP, then how to use SpeTestNP to test a parametric specification in practice with an illustration using the expected earnings conditional on education and age.

# 2 Testing for a parametric specification

In order to present the specification tests available in SpeTestNP we first describe the model being considered and define the null and alternative hypothesis, second we highlight the principle behind each test, third we derive the test statistics and their rejection rules (based on either the bootstrap or Gaussian asymptotics), and fourth we briefly discuss and compare the tests size and power performances.

## 2.1 Model

Consider a sample $$(y_j,x_j')_{j=1}^{n}$$ of independent observations with $$y_j$$ the scalar outcome and $$x_j$$ a $$k\times 1$$ vector of exogenous explanatory variables. Then as long as $$\mathbb{E}(|y_j|)<+\infty$$ there exists some Borel-measurable regression function $$g(\cdot)$$ such that $$g(x_j)=\mathbb{E}(y_j|x_j) \ \ a.s$$. That is the true model linking $$y_j$$ and $$x_j$$ writes

$y_j=g(x_j)+\varepsilon_j, \qquad \mathbb{E}(\varepsilon_j|x_j)=0 \quad a.s$

for $$j=1,2,\dots,n$$ and where $$\varepsilon_j$$ denotes the part of $$y_j$$ which is unexplained by $$x_j$$ in terms of the mean. But instead in practice some parametric model characterized by a parametric family of functions $$\mathcal{F}=\{f(\cdot,\tilde{\theta}):\tilde{\theta}\in\Theta\subset\mathbb{R}^k\}$$ is considered

$y_j=f(x_j,\theta)+u_j$

where $$\theta= \ \underset{\tilde{\theta}\in\Theta}{Argmin} \ \mathbb{E}((y_j-f(x_j,\tilde{\theta}))^2)$$ is the parameter which yields the best mean square error fit for this parametric model, and where $$u_j$$ is the error induced by this parametric model. A typical estimator of $$\theta$$ is the non-linear least squares (NLS) estimator denoted by $$\hat{\theta}$$, thus when $$\mathcal{F}$$ is the family of linear functions then $$\hat{\theta}$$ is the OLS estimator. Next notice that if $$g(\cdot)\in\mathcal{F}$$ then $$\mathbb{E}(u_j|x_j)=0 \ a.s$$ or equivalently $$\mathbb{E}(y_j|x_j)=f(x_j,\theta)$$. Indeed if $$g(\cdot)\in\mathcal{F}$$ then by properties of projections

$g(\cdot)= \ \underset{\tilde{g}}{Argmin} \ \mathbb{E}((y_j-\tilde{g}(x_j))^2) =\ \underset{\tilde{g}\in\mathcal{F}}{Argmin} \ \mathbb{E}((y_j-\tilde{g}(x_j))^2)= \ \underset{\tilde{\theta}\in\Theta}{Argmin} \ \mathbb{E}((y_j-f(x_j,\tilde{\theta}))^2)=f(\cdot,\theta)$

Consequently when modeling the true relationship between $$y$$ and $$x$$ with a parametric model, the implicit null hypothesis is

$H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s$

And the alternative hypothesis is

$H_1:\mathbb{P}(\mathbb{E}(u_j|x_j)=0)<1$

Equivalently the null and alternative hypothesis write

$H_0: g(x_j)=f(x_j,\theta) \quad a.s, \qquad H_1:\mathbb{P}(g(x_j)=f(x_j,\theta))<1$

## 2.2 Tests principle

Next to construct specification tests the null hypothesis is reformulated into moments conditions from which statistics can be derived. The five reformulations of the null hypothesis are in order.

### Bierens (1982)

Bierens (1982) proves that the conditional moment condition of the null hypothesis is equivalent to an infinite number of moment conditions which is equivalent to an integrated conditional moment condition

$H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \ \Leftrightarrow \mathbb{E}(u_j exp(i\beta' x_j))=0 \ \ \forall \beta\in\mathbb{R}^{k}\Leftrightarrow \int_{\mathbb{R}^k}\left|\mathbb{E}(u_j exp(i\beta' x_j))\right|^2d\mu(\beta)=0$

where $$\mu(\cdot)$$ is any positive almost everywhere measure, $$|\cdot|$$ denotes the modulus, and $$i$$ is the imaginary unit.

### Zheng (1996)

Instead Zheng (1996) finds an equivalence between the conditional moment condition and an unconditional one

$H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \ \Leftrightarrow\mathbb{E}(u_j\mathbb{E}(u_j|x_j)f(x_j))=0$

where $$f(\cdot)$$ denotes the probability density function of $$x_j$$.

### Escanciano (2006)

Escanciano (2006) proves the equivalence between the null hypothesis, an infinite number of moment conditions which differ from Bierens (1982), and an integrated moment condition

$H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \ \Leftrightarrow\mathbb{E}(u_j1\{\beta' x_j\leqslant l\})=0 \ \ \forall(t,l)\in\mathbb{S}^{k}\times \mathbb{R}\\ \Leftrightarrow \int_{\mathbb{S}^k\times\mathbb{R}}\mathbb{E}^2(u_j1\{\beta' x_j\leqslant l\})f_\beta(l)d\beta dl=0$

where $$1\{\cdot\}$$ denotes the indicator function, $$\mathbb{S}^{k}=\{\beta\in\mathbb{R}^k:|\beta|=1\}$$ denotes the unit sphere, and $$f_\beta(\cdot)$$ denotes the probability density function of $$\beta' x_j$$.

### Lavergne and Patilea (2008)

Lavergne and Patilea (2008) show that the null hypothesis is equivalent to an infinite number of unconditional moment conditions

$H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \ \Leftrightarrow \ \underset{||\beta||=1}{max} \ \mathbb{E}(u_j\mathbb{E}(u_j|\beta' x_j)\omega(\beta' x_j))=0$

for any $$\omega(\cdot)$$ such that $$\forall \beta\in\mathbb{R}^k$$, $$\omega(\beta' x_j)>0$$ on the support of $$\mathbb{E}(u_j|\beta' x_j)$$. This condition resembles that of Zheng (1996) with $$\beta' x_j$$ replacing $$x_j$$ in an effort to remove the curse of dimensionality.

### Lavergne and Patilea (2012)

Finally Lavergne and Patilea (2012) prove the equivalence between the null and an integrated moment condition

$H_0:\mathbb{E}(u_j|x_j)=0 \quad a.s \ \Leftrightarrow \int_B\mathbb{E}(\mathbb{E}^2(u_j|\beta' x_j)f_\beta(\beta' x_j))d\beta=0$

where $$B\subseteq \mathbb{S}^k$$ and $$f_\beta(\cdot)$$ denotes the density of $$\beta' x_j$$. This moment condition combines the integrated moments approaches of Bierens (1982) and Escanciano (2006) and the dimension reduction devise used in Lavergne and Patilea (2008).

## 2.3 Test statistics

Each test relies on reformulating the null hypothesis into a moment condition for which an empirical counterpart exist. Thus the test statistics are sample analogs of the moments defining the null hypothesis, possibly multiplied by the sample size in order to obtain variation at the limit. Denote by $$\hat{\theta}$$ a consistent estimator of $$\theta$$ and let $$\hat{u}_j=y_j-f(x_j,\hat{\theta})$$ denote the residual for individual $$j$$. The five test statistics are derived in order.

### Bierens (1982)

An empirical counterpart of the integrated conditional moment $$\int_{\mathbb{R}^k}\left|\mathbb{E}(u_j exp(i\beta' x_j))\right|^2d\mu(\beta)$$ of Bierens (1982) is

$T_{icm}=\int_{\mathbb{R}^k}\left|\frac{1}{\sqrt{n}}\sum_{j=1}^n\hat{u}_j exp(i\beta' x_j)\right|^2d\mu(\beta)$

with some positive almost everywhere measure $$\mu(\cdot)$$ and where $$|\cdot|$$ denotes the modulus. Using properties of the modulus and of the Fourier transform it can then be shown that

$T_{icm}=\frac{1}{n}\sum_{j,j'}\hat{u}_j\hat{u}_{j'}K(x_j-x_{j'})=\frac{1}{n}\hat{u}'W_{icm}\hat{u}$

where $$K(\cdot)$$ is the Fourier transform of $$\mu(\cdot)$$, $$\hat{u}=(\hat{u}_1,\dots,\hat{u}_n)'$$ is the $$n\times 1$$ vector of stacked residuals, and $$W_{icm}$$ is the matrix with entries $$K(x_j-x_{j'})$$ for any row $$j$$ and column $$j'$$. Although this statistic can be used as is, $$\mu(\cdot)$$ is typically assumed to be a symmetric probability measure which is strictly positive almost everywhere. This simplifies the asymptotic theory and the derivation of the test statistic in practice. Indeed as a consequence the Fourier transform of $$\mu(\cdot)$$ denoted as $$K(\cdot)$$ is a symmetric bounded density. Hence candidates for $$K(\cdot)$$ include logistic, triangular, normal, student, or Cauchy densities, see Johnson, Kotz and Balakrishnan (1995, section 23.3) and Dreier and Kotz (2002). Furthermore to control for scale, we impose that either the integral of $$K(\cdot)$$ to the square equals one or that the distribution associated to $$K(\cdot)$$ has variance one.

### Zheng (1996)

Zheng (1996) test statistic is the sample analog of $$\mathbb{E}(u_j\mathbb{E}(u_j|x_j)f(x_j))$$ which is derived by estimating both the density $$f(\cdot)$$ of $$x_j$$ and the conditional mean $$\mathbb{E}(u_j|x_j=\cdot)$$ with Kernels. For any $$\tilde{x}\in\mathbb{R}^k$$ define

$\hat{f}(\tilde{x})=\frac{1}{nh^k}\sum_j K\left(\frac{\tilde{x}-x_j}{h}\right), \qquad \hat{\mathbb{E}}(u_j|x_j=\tilde{x})=\frac{1}{nh^k}\sum_j \frac{u_j}{\hat{f}(\tilde{x})}K\left(\frac{\tilde{x}-x_j}{h}\right)$

where $$K(\cdot)$$ is a Kernel function which is nonnegative, symmetric, bounded, continuous and which integrates to one, and $$h$$ a bandwidth such that $$h\underset{n\rightarrow+\infty}{\rightarrow}0$$ and $$nh^k\underset{n\rightarrow+\infty}{\rightarrow}+\infty$$. Then the test statistic is the sample analog of the moment $$\mathbb{E}(u_j\mathbb{E}(u_j|x_j)f(x_j))$$

$T_{zheng}=\frac{1}{n}\sum_j \hat{u}_j\hat{\mathbb{E}}(u_{j'}|x_{j'}=x_j)\hat{f}(x_j)$

It can be rewritten as

$T_{zheng}=\frac{1}{n(n−1)h^k}\sum_{j,j′\neq j}\hat{u}_j\hat{u}_{j′}K\left(\frac{x_j−x_{j′}}{h}\right)=\frac{1}{n(n−1)h^k}\hat{u}^′W_{zheng}\hat{u}$

where $$W_{zheng}$$ is a matrix whose diagonal elements are equal to zero and its other entries are equal to $$K\left(\frac{x_j−x_{j′}}{h}\right)$$ for any row $$j$$ any column $$j′$$ such that $$j\neq j′$$.

### Escanciano (2006)

Escanciano (2006) test statistic is the sample analog of $$\int_{\mathbb{S}^k\times\mathbb{R}} \mathbb{E}^2(u_j1\{\beta' x_j\leqslant l\})f_\beta(l)d\beta dl$$ times $$n$$ which is derived by approximating the density $$f_\beta(\cdot)$$ by a probability mass function. Let $$\hat{f}_\beta(l)=\frac{1}{n}\sum_r1\{\beta' x_r=l\}$$ then the statistic is

$T_{esca}=\int_{\mathbb{S}^k\times\mathbb{R}}\left(\frac{1}{\sqrt{n}}\sum_j \hat{u}_j1\{\beta' x_j\leqslant l\}\right)^2\hat{f}_\beta(l)d\beta dl$

It can be proven that it has the same form as the other test statistics

$T_{esca} = \frac{1}{n}\sum_{j,j'}\hat{u}_j\hat{u}_{j'}\frac{1}{n}\sum_r\int_{\mathbb{S}^k}1\{\beta' x_j\leqslant \beta' x_r,\beta' x_{j'}\leqslant \beta' x_r\}d\beta=\frac{1}{n}\hat{u}'W_{esca}\hat{u}$

where $$W_{esca}$$ has elements $$\frac{1}{n}\sum_r W_{esca,j,j',r}$$ with $$W_{esca,j,j',r}=\int_{\mathbb{S}^k}1\{\beta' x_j\leqslant \beta' x_r,\beta' x_{j'}\leqslant \beta' x_r\}d\beta$$ for any row $$j$$ and column $$j'$$. Approximating the integrals in $$W_{esca}$$ is unnecessary because

$W_{esca,j,j',r}=W_{esca,j,j',r}^{(0)}\frac{\pi^{k/2}-1}{\Gamma(k/2+1)}, \qquad W_{esca,j,j',r}^{(0)}=\left|\pi-arccos\left(\frac{(x_j-x_{r})'(x_{j'}-x_r)}{|x_j-x_{r}||x_{j'}-x_r|}\right)\right|$

See appendix B in Escanciano (2006) for more details. Note that $$n^3$$ operations are necessary to compute $$W_{esca}$$ which means that this statistic takes much more time to compute.

### Lavergne and Patilea (2008)

Lavergne and Patilea (2008) consider a sample analog of the moment $$\mathbb{E}(u_j\mathbb{E}(u_j|x_j)\omega(\beta' x_j))$$ and replace $$\omega(\cdot)$$ by $$f_\beta(\cdot)$$ the density of $$\beta' x_j$$. In addition they replace $$\beta$$ by the value in the unit hypersphere which maximizes the moment taken to the square. This way the test is given the direction which best reject the null hypothesis under the alternative. Thus first define for any $$t\in\mathbb{S}^k$$

$\mathcal{Q}(\beta)=\frac{1}{n(n-1)h}\sum_{j,j'\neq j}\hat{u}_j\hat{u}_{j'} K\left(\frac{\beta'(x_j-x_{j'})}{h}\right)$

where $$K(\cdot)$$ is a bounded symmetric density with bounded variation, $$h$$ is a bandwidth such that $$h\underset{n\rightarrow +\infty}{\longrightarrow}0$$ and $$\frac{(nh^2)^{\delta}}{log(n)}\underset{n\rightarrow+\infty}{\longrightarrow}+\infty$$ for some $$\delta\in(0;1)$$. $$\mathcal{Q}(\beta)$$ cannot be directly used, instead define $$\hat{\beta}$$ the direction which best captures the correlation between the residuals and the explanatory variables

$\hat{\beta}= \ \underset{\beta\in\mathbb{S}^k}{Argmax} \ |n\sqrt{h}\mathcal{Q}(\beta)−\alpha_n 1\{\beta\neq \beta^*\}|$

where $$\beta^*$$ represents a favored direction chosen a priori, and $$\alpha_n\underset{n\rightarrow +\infty}{\rightarrow}0$$ is the weight given to this favored direction. $$\beta^*$$ and $$\alpha_n$$ improve significantly the power properties of the test in small sample. Note that in practice the unit hypersphere $$\mathbb{S}^k$$ is approximated by a finite number of points. Thus the test statistic is the criterion evaluated at $$\hat{\beta}$$

$T_{pala}=\mathcal{Q}(\hat{\beta})=\frac{1}{n(n−1)h}\sum_{j,j′\neq j}\hat{u}_j\hat{u}_jK\left(\frac{\hat{\beta}′(x_j−x_{j′})}{h}\right)=\frac{1}{n(n−1)h}\hat{u}^′W_{pala}\hat{u}$

where $$W_{pala}$$ is a matrix with diagonal elements equal to zero and its other entries equal to $$K\left(\frac{\hat{β}′(x_j−x_j)}{h}\right)$$ for any row $$j$$ and column $$j′$$ such that $$j\neq j′$$.

### Lavergne and Patilea (2012)

Finally Lavergne and Patilea (2012) use the sample analog of $$\int_B\mathbb{E}(\mathbb{E}^2(u_j|\beta' x_j)f_\beta(\beta' x_j))d\beta=0$$ for some $$B\subseteq\mathbb{S}^k$$ as a test statistic. To derive it notice that an empirical counterpart of $$\mathbb{E}(\mathbb{E}^2(u_j|\beta' x_j)f_\beta(\beta' x_j))$$ is $$\mathcal{Q}(\beta)$$ as defined in previously. Hence their test statistic which they call smooth integrated conditional moment statistic writes

$T_{sicm}=\int_B\mathcal{Q}(\beta)d\beta=\int_B\frac{1}{n(n-1)h}\sum_{j,j'\neq j}\hat{u}_j\hat{u}_{j'}K\left(\frac{\beta'(x_j-x_{j'})}{h}\right)d\beta=\frac{1}{n(n-1)h}\hat{u}'W_{sicm}\hat{u}$

where $$W_{sicm}$$ has diagonal elements equal to zero and its other elements are equal to $$\int_BK\left(\frac{\beta'(x_j-x_{j'})}{h}\right)d\beta$$ for any row $$j$$ and any column $$j'\neq j$$. Clearly $$T_{sicm}$$ is a smooth version of $$T_{icm}$$ because of the bandwidth $$h$$. Furthermore it is also a smooth version of $$T_{pala}$$ in the sense that instead of being based on the squared error in the worst direction of $$\beta' x_j$$, it is based on a continuum of directions. In practice to compute the integral a finite number of points are drawn randomly from $$B$$ and $$B$$ doesn’t have to be the whole unit hypersphere $$\mathbb{S}^k$$. For instance half hyperspheres can be considered such as $$\{\beta\in\mathbb{R}^k:\beta_m\geqslant 0,||\beta||=1\}$$ where $$\beta_m$$ denotes the $$m$$-th element of the vector $$\beta$$.

## 2.4 Normalization

The five test statistics can be normalized. Not only does this improve the finite sample properties of the tests, but it allows to use Gaussian asymptotics when deciding to reject the null hypothesis with the tests of Zheng (1996), Lavergne and Patilea (2008), and Lavergne and Patilea (2012). This is extremely useful in large samples instead of using the bootstrap.

The normalized test statistics are of the following form:

$\hat{T}_{icm}=\hat{u}^′\hat{W}_ {icm}\hat{u}, \qquad \hat{W}_{icm}=W_{icm}\sqrt{2\sum_{j,j′}\hat{\sigma}^2_j\hat{\sigma}^2_{j′}K^2(x_j−x_{j′})}$

$\hat{T}_{zheng}=\hat{u}^′\hat{W}_{zheng}\hat{u},\qquad \hat{W}_{zheng}=W_{zheng}\sqrt{2\sum_{j,j′\neq j}\hat{\sigma}^2_j\hat{\sigma}^2_{j'}K^2\left(\frac{x_j−x_{j′}}{h}\right)}$

$\hat{T}_{esca}=\hat{u}′\hat{W}_{esca}\hat{u}, \qquad \hat{W}_{esca}=W_{esca}\sqrt{2\sum_{j,j′}\hat{\sigma}_j^2\hat{\sigma}^2_{j′}\left(\frac{1}{n}\sum_r\int_{\mathbb{S}^k}1\{\beta′x_j\leqslant \beta′x_r,\beta′x_{j′}⩽\beta′x_r\}d\beta\right)^2}$

$\hat{T}_{pala}=\hat{u}^′\hat{W}_{pala}\hat{u}, \qquad \hat{W}_{pala}=W_{pala}\sqrt{2\sum_{j,j′\neq j}\hat{\sigma}^2_j\hat{\sigma}_{j'}^2K^2\left(\frac{\hat{β}′(x_j−x_{j′})}{h}\right)}$

$\hat{T}_{sicm}=\hat{u}^′\hat{W}_{sicm}\hat{u}, \qquad \hat{W}_{sicm}=W_{sicm}\sqrt{2\sum_{j,j′\neq j}\hat{\sigma}_j^2\hat{\sigma}^2_{j'}\left(\int_BK\left(\frac{\beta′(x_j−x_{j′})}{h}\right)d\beta\right)^2}$

where $$\hat{\sigma}_j^2$$ controls for the conditional variance of the error uj. A naive approach to the normalization which works very well in large sample is to directly replace $$\hat{\sigma}_j^2$$ by the squared residuals $$\hat{u}_j^2$$. Another approach to the normalization is to replace $$\hat{\sigma}_j^2$$ by an estimator such the as the nonparametric kernel variance estimator of Yin, Geng, Li and Wang (2010) which writes

$\hat{\sigma}^2(\tilde{x})=\frac{\frac{1}{nh_v}\sum_j(y_j−\overline{y}(\tilde{x}))^2K\left(\frac{\tilde{x}−x_j}{h_v}\right)}{\frac{1}{nh_v}\sum_jK\left(\frac{\tilde{x}−x_j}{h_v}\right)}, \qquad \overline{y}(\tilde{x})=\frac{\frac{1}{nh_v}\sum_j y_jK\left(\frac{\tilde{x}−x_j}{h_v}\right)}{\frac{1}{nh_v}\sum_j K\left(\frac{\tilde{x}−x_j}{h_v}\right)}$

where $$K$$ is a Kernel function and $$h_v$$ is a bandwidth which can be different from $$h$$.

Both the naive and nonparametric approaches to the normalization are implemented.

## 2.5 Rejection rules

To decide whether to reject or not the null hypothesis we need to compute quantiles of the distribution of each statistic under the null conditional on $$x≡=(x_1,\dots,x_n)′$$. Then $$H_0$$ is rejected at level 5% if the test statistic is above the quantile 95% of its distribution under the null. To compute these quantiles we propose two solutions.

First we consider computing the quantiles using the fixed design bootstrap. $$x$$ is held fixed so for each test statistic their central W is held fixed, and a n×1 vector of residuals ˆub is drawn using the fixed design wild bootstrap of Wu (1986) or the smooth conditional moment bootstrap of Gozalo (1997). It will also control for potential heteroskedasticity. Using this bootstrapped vector of residuals and the maintained central matrix $$W$$ a bootstrapped statistic can be computed. After repeating this operation many times we obtain a vector of bootstrapped statistics. The quantiles of this vector can then be used to reject or not $$H_0$$. As an example if the test we consider is that of Bierens (1982) a bootstrapped statistic is

$T_{icm,b}=\frac{1}{n}\hat{u}^′_bW_{icm}\hat{u}_b$

By repeating this operation B times we obtain B bootstrapped statistics $$(T_{icm,b})_{b=1}^B$$ which mimic the behavior of $$T_{icm}$$ under the null hypothesis. Consequently the parametric specification will be rejected at level 5% if $$T_{icm}>q_{95\%}$$ where $$q_{95\%}$$ is the 95% quantile of $$(T_{icm,b})^B_{b=1}$$. The same procedure can be applied to other tests and their normalized versions to decide whether or not to reject the null hypothesis.

Second we consider using the quantiles of the standard normal. As mentioned, the normalized versions of the statistics of Zheng (1996), Lavergne and Patilea (2008), and Lavergne and Patilea (2012) are asymptotically standard normal. Thus if one of these normalized test statistics are used, we can use the quantiles of a standard normal to reject or not $$H_0$$. As an example if the test we consider is that of Zheng (1996) with a normalization then the parametric specification will be rejected at level 5% if $$|\hat{T}_{zheng}|>1.96$$.

## 2.6 Validity, consistency and power properties

Each test can be proven to be valid, as in under the null hypothesis the probability to reject the null converges to nominal level, and to be consistent, as in under any fixed alternative the probability to reject the null converges to one.

But these five tests differ significantly in terms of power in practice. The test of Zheng (1996) seem to be the least powerful test in practice, it has no power against Pitman alternatives and has difficulty rejecting the null when the number $$k$$ of exogenous variables is large. The test of Bierens (1982) possesses more than trivial power against Pitman alternatives but it also has trouble rejecting the null when $$k$$ is large. The test of Escanciano (2006) does not depend on a choice of weighting function and does not require numerical integration however to derive its statistic it requires $$n^3$$ operations making it very slow and hard to apply in practice. In addition its power however largely depends on the true alternative and is low when $$k$$ is large. The tests of Lavergne and Patilea (2008), and Lavergne and Patilea (2012) are more powerful than the other two when $$k$$ is large because of their use of a continuum of single index $$\beta^′x_j$$ to summarize the correlation between $$u_j$$ and $$x_j$$. At the same time when $$k$$ is small the two tests are at least as powerful as the others. As mentioned the power of Lavergne and Patilea (2008) test comes from the “worst” single-index alternative whereas the power of Lavergne and Patilea (2012) test comes from a continuum of single-index alternatives. Thus in practice under the alternative the nature of the correlation between $$u_j$$ and $$x_j$$ will determine which of these two tests is more powerful.

See the references for more details.

# 3 Using SpeTestNP

Previously we have described the principle behind the five nonparametric specification tests, how to derive the test statistics and the rejection rules, and discussed their properties. Next we show how to use SpeTestNP to test parametric models in practice, with first the installation, second a description of how to use the test, third a thorough description of the arguments of the package main function SpeTest, and fourth an illustration to determine the true shape of expected wages conditional on years of education and age.

## 3.1 Installation

To install SpeTestNP from CRAN simply run the following command:

install.packages("SpeTestNP")

To install SpeTestNP from Github the package devtools should be installed and the following commands should be run:

install.packages("devtools")

library("devtools")

install_github("HippolyteBoucher/SpeTestNP")

To choose where and how the package is installed check help(install_github) and help(install.packages). Alternatively users can download the package and directly install it with the CMD. SpeTestNP requires the packages stats (already installed and loaded by default in Rstudio), foreach, parallel and doParallel (if parallel computing is used to generate the vector) to be installed.

## 3.2 Testing with SpeTestNP

Recall the true model and the model induced by the parametric specification characterized by $$\mathcal{F}=\{f(\cdot,\tilde{\theta}):\tilde{\theta}\in\Theta\subset\mathbb{R}^k\}$$

$y_j=g(x_j)+\varepsilon_j, \qquad y_j=f(x_j,\theta)+u_j$ where $$\mathbb{E}(y_j|x_j)=g(x_j) \ a.s$$ and $$\theta= \ \underset{\tilde{\theta}\in\Theta}{Argmin} \ \mathbb{E}((y_j-f(x_j,\tilde{\theta}))^2)$$.

Then to test the parametric specification or equivalently to test $$H_0:\mathbb{E}(u_j|x_j)=0 \ a.s$$ the function SpeTest of the package SpeTestNP can be directly used by filling the first argument eq with a fitted model of class lm or nls. In case the parametric specification is linear or can be rewritten in a linear form eq should be an object of class lm. In case of non-linear models eq should be an object of class nls which stands for non-linear least squares (from the package stats). Note that in order to perform the specification test by feeding SpeTest with an nls model then the arguments in nls must be given in the right order. Then by running the following command the parametric specification characterized by $$\mathcal{F}$$ is tested

  SpeTest(eq)

The function returns an object of class STNP which when printed with print or print.STNP returns the test statistic and its p-value. An object of type STNP is a list which not only contains the test statistic stat and its p-value pval but also the type of the test type, the rejection rule rejection, the test statistic normalization norma, the Kernel function denoted as $$K(\cdot)$$ used to compute the test statistic central matrix ker, the standardization method of test the statistic central matrix knorm, the type of bootstrap used to compute the p-value boot, the number of bootstrap samples used to compute the p-value nboot, the bandwidths cch and hv, etc… To obtain a summary of the test and its options the method summary or summary.STNP can be used on objects of class STNP.

By default the test of Bierens (1982) with the standard normal density as the central matrix function is applied and the test p-value is obtained using 50 wild bootstrap samples with a naive estimator of the conditional variance of the errors. Among many options, by changing the argument rejection from bootstrap (the default) to asymptotics if type = "zheng" or type = "pala" or type = "sicm" the test p-value is then based on the asymptotic normality of these normalized test statistics under the null. In addition by default the test statistic is not normalized as in by default the denominator in $$T_{zheng}$$, $$T_{pala}$$ and $$T_{sicm}$$ is set to one. This can be changed by setting norma = "naive" to normalize the statistic using a naive estimator of the errors conditional variance, or by setting norma = "np" to normalize the statistic using a nonparametric estimator of the errors conditional variance. If rejection = "bootstrap" setting para to TRUE greatly speeds up the computation of the p-value by deriving bootstrapped statistics in parallel. For more details refer to the next section or help(SpeTest).

Note that the functions SpeTest_Stat and SpeTest_Dist are also available. Both functions take similar arguments to SpeTest. SpeTest_Stat computes the specification test statistic, while SpeTest_Dist generates a vector of size nboot from the specification test statistic distribution under the null hypothesis using the bootstrap. The argument para is also available to SpeTest_Dist. SpeTest_Stat and SpeTest_Dist allow to easily perform simulation exercises.

## 3.3 Arguments description and additional features

To be more specific about the arguments of the function SpeTest:

• Argument eq should be the fitted parametric model of class lm or nlsof the parametric specification of interest $$\mathcal{F}$$

• Argument type refers to the type of the test

If type = "icm" the test of Bierens (1982) is performed (default)

If type = "zheng" the test of Zheng (1996) is performed

If type = "esca" the test of Escanciano (2006) is performed, significantly increases computing time

If type = "pala" the test of Lavergne and Patilea (2008) is performed

If type = "sicm" the test of Lavergne and Patilea (2012) is performed

• Argument rejection refers to the rejection rule

If rejection = "bootstrap" the p-value of the test is based on the bootstrap (default)

If rejection = "asymptotics" and type = "zheng" or type = "esca" or type = "sicm" the p-value of the test is based on asymptotic normality of the normalized version of one of these test statistic under the null hypothesis

If type = "icm" or type = "esca" the argument rejection is ignored and the p-value is based on the bootstrap

• Argument norma refers to the normalization of the test statistic

If norma = "no" the test statistic is not normalized (default)

If norma = "naive" the test statistic is normalized with a naive estimator of the errors variance

If norma = "np" the test statistic is normalized with a nonparametric estimator of the errors variance

• Argument boot refers to the bootstrap method used to compute the test p-value when rejection = "bootstrap"

If boot = "wild" the wild bootstrap of Wu (1986) is used (default)

If boot = "smooth" the smooth conditional moments bootstrap of Gozalo (1997) is used

• Argument nboot is the number of bootstraps used to compute the test p-value, by default nboot = 50}

• Argument para determines if parallel computing is used or not when rejection = "bootstrap"

If para = FALSE parallel computing is not used to generate the bootstrap samples to compute the test p-value (default)

If para = TRUE parallel computing is used to generate the bootstrap samples to compute the test p-value, significantly decreases computing time, makes use of all CPU cores except one

• Argument ker refers to the Kernel function used in the central matrix and for the nonparametric covariance estimator if there is any

If ker = "normal" the central matrix Kernel function is the normal p.d.f (default)

If ker = "triangle" the central matrix Kernel function is the triangular p.d.f

If ker = "logistic" the central matrix Kernel function is the logistic p.d.f

If ker = "sinc" the central matrix Kernel function is the sine cardinal function

• Argument knorm refers to the normalization of the Kernel function

If knorm = "sd" then the standard deviation using the Kernel function equals 1 (default)

If knorm ="sq" then the integral of the squared Kernel function equals 1

• Argument cch is the central matrix Kernel bandwidth

If type = "icm" or type = "esca" then cch always equals 1

If type = "zheng" the "default" bandwidth is the scaled rule of thumb: cch = 1.06*n^(-1/5)

If type = "sicm" and type = "pala" the "default" bandwidth is the scaled rule of thumb: cch = 1.06*n^(-1/(4+k)) where k is the number of regressors

The user may change the bandwidth when type = "zheng", type = "sicm" or type = "pala".

• Argument hv is the bandwidth the nonparametric errors covariance estimator when norma = "np" or rejection = "bootstrap" and boot = "smooth"

By "default" the bandwidth is the scaled rule of thumb hv = 1.06*n^(-1/(4+k))

• Argument nbeta refers to the number of elements $$\beta$$ used to represent the unit hypersphere $$\mathcal{S}^k$$ when type = "pala" or type = "sicm"

Computing time increases as nbeta gets larger

By "default" it is equal to 20 times the square root of the number of exogenous control variables

• Argument direct refers to the default “directions” for the tests of Lavergne and Patilea (2008) and Lavergne and Patilea (2012)

If type = "pala", direct is the favored direction for $$\beta$$, by "default" it is the OLS estimator if class(eq) = "lm"

If type = "sicm", direct is the initial direction for $$\beta$$. This direction should be a vector of 0 (for no direction), 1 (for positive direction) and -1 (for negative direction)

For example, c(1,-1,0) indicates that the user thinks that the 1st regressor has a positive effect on the dependent variable, that the 2nd regressor has a negative effect on the dependent variable, and that he has no idea about the effect of the 3rd regressor

By "default" no direction is given to the hypersphere

• Argument alphan refers to the weight given to the favored direction for $$\beta$$ when type = "pala"

By "default" it is equal to log(n)*n^(-3/2)

Before changing the default options of arguments norma, direct and alphan we strongly advise the user to read the tests references.

## 3.4 Illustration

To finish we use data on 1,000 individuals from the Current Population Survey as in Stock and Watson (2007) to find the true shape of their expected earnings conditional on their years of education and their age using the test of Bierens (1982).

    library(SpeTestNP)
library(AER)

### Loading the data and taking a first look

data( CPSSW8 )

summary ( CPSSW8 )
#>     earnings         gender           age              region
#>  Min.   : 2.003   male  :34348   Min.   :21.00   Northeast:12371
#>  1st Qu.:11.058   female:27047   1st Qu.:33.00   Midwest  :15136
#>  Median :16.250                  Median :41.00   South    :18963
#>  Mean   :18.435                  Mean   :41.23   West     :14925
#>  3rd Qu.:23.558                  3rd Qu.:49.00
#>  Max.   :72.115                  Max.   :64.00
#>    education
#>  Min.   : 6.00
#>  1st Qu.:12.00
#>  Median :13.00
#>  Mean   :13.64
#>  3rd Qu.:16.00
#>  Max.   :20.00

Thus the dependent variable we consider is earnings and the explanatory variables we use to build the conditional expectation are education and age. First we fit a linear specification of conditional earnings.

    lm_lin <- lm( earnings ~ age + education,
data = CPSSW8[1:1000,] )

summary ( lm_lin )
#>
#> Call:
#> lm(formula = earnings ~ age + education, data = CPSSW8[1:1000,
#>     ])
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -27.313  -6.464  -1.445   4.804  42.092
#>
#> Coefficients:
#>              Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -14.18639    2.10661  -6.734 2.78e-11 ***
#> age           0.15846    0.02747   5.767 1.07e-08 ***
#> education     1.93904    0.12286  15.782  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.465 on 997 degrees of freedom
#> Multiple R-squared:  0.2176, Adjusted R-squared:  0.216
#> F-statistic: 138.7 on 2 and 997 DF,  p-value: < 2.2e-16

Both variables are very significant. Then we perform two tests of the linear specification, the bootstrap test of Bierens (1982) using the bootstrap decision rule, and the asymptotic test of Zheng (1996) with a naive normalization.

    SpeTest( lm_lin , type = "icm" , rejection = "bootstrap" )
#>
#>   Bierens (1982) integrated conditional moment test
#>
#>   Test statistic :  27.31333
#>   Bootstrap p-value :  0
#> 
    SpeTest( lm_lin , type = "zheng" , rejection = "asymptotics" )
#>
#>   Zheng (1996) test
#>
#>   Normalized test statistic :  1.47353
#>   Asymptotic p-value :  0.0703
#> 

The linear specification is rejected at level below 1% for the test of Bierens (1982) and at level below 10% for the test of Zheng (1996). So we fit a quadratic specification and perform the same tests.

    lm_quad <- lm( earnings ~ age + I(age^2) + education + I(education^2),
data = CPSSW8[1:1000,] )

summary( lm_quad )
#>
#> Call:
#> lm(formula = earnings ~ age + I(age^2) + education + I(education^2),
#>     data = CPSSW8[1:1000, ])
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -32.167  -6.242  -1.412   4.665  41.753
#>
#> Coefficients:
#>                 Estimate Std. Error t value Pr(>|t|)
#> (Intercept)    -3.353005   8.633125  -0.388  0.69781
#> age             1.011953   0.212083   4.772  2.1e-06 ***
#> I(age^2)       -0.010051   0.002456  -4.093  4.6e-05 ***
#> education      -2.079218   1.041245  -1.997  0.04611 *
#> I(education^2)  0.140968   0.036501   3.862  0.00012 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 9.323 on 995 degrees of freedom
#> Multiple R-squared:  0.2424, Adjusted R-squared:  0.2393
#> F-statistic: 79.58 on 4 and 995 DF,  p-value: < 2.2e-16
    SpeTest( lm_quad , type = "icm" , rejection = "bootstrap" )
#>
#>   Bierens (1982) integrated conditional moment test
#>
#>   Test statistic :  1.45746
#>   Bootstrap p-value :  0.1
#> 
    SpeTest( lm_quad , type = "zheng" , rejection = "asymptotics")
#>
#>   Zheng (1996) test
#>
#>   Normalized test statistic :  -0.98736
#>   Asymptotic p-value :  0.16173
#> 

Both age and education to the square are very significant. In addition the p-values of both tests are above 15% so we cannot reject the quadratic specification. Finally we test a highly non-linear specification with age, age to the square, education, education to the square, and their products included as controls:

    lm_nlin <- lm( earnings ~ age + I(age^2) + education + I(education^2)
+ I(education*age) + I(education^2*age)
+ I(education*age^2) + I(education^2*age^2),
data= CPSSW8[1:1000,] )

summary( lm_nlin )
#>
#> Call:
#> lm(formula = earnings ~ age + I(age^2) + education + I(education^2) +
#>     I(education * age) + I(education^2 * age) + I(education *
#>     age^2) + I(education^2 * age^2), data = CPSSW8[1:1000, ])
#>
#> Residuals:
#>     Min      1Q  Median      3Q     Max
#> -33.135  -6.212  -1.485   4.515  41.920
#>
#> Coefficients:
#>                          Estimate Std. Error t value Pr(>|t|)
#> (Intercept)             6.006e+01  1.334e+02   0.450    0.653
#> age                    -3.545e-01  6.060e+00  -0.058    0.953
#> I(age^2)               -1.043e-02  6.707e-02  -0.155    0.876
#> education              -9.404e+00  1.924e+01  -0.489    0.625
#> I(education^2)          3.335e-01  6.815e-01   0.489    0.625
#> I(education * age)      1.277e-01  8.738e-01   0.146    0.884
#> I(education^2 * age)   -2.053e-03  3.093e-02  -0.066    0.947
#> I(education * age^2)    6.633e-04  9.669e-03   0.069    0.945
#> I(education^2 * age^2) -4.410e-05  3.420e-04  -0.129    0.897
#>
#> Residual standard error: 9.316 on 991 degrees of freedom
#> Multiple R-squared:  0.2467, Adjusted R-squared:  0.2406
#> F-statistic: 40.56 on 8 and 991 DF,  p-value: < 2.2e-16
    SpeTest( lm_nlin , type = "icm" , rejection = "bootstrap" )
#>
#>   Bierens (1982) integrated conditional moment test
#>
#>   Test statistic :  0.02541
#>   Bootstrap p-value :  0.78
#> 
    SpeTest( lm_nlin , type = "zheng" , rejection = "asymptotics")
#>
#>   Zheng (1996) test
#>
#>   Normalized test statistic :  -1.8227
#>   Asymptotic p-value :  0.03417
#> 

This time none of the variables are considered (individually) significant. This does not mean that this specification is wrong, in fact it nests the quadratic specification. Note that the p-value of the test of Bierens (1982) is very high while the p-value of asymptotic test of Zheng (1996) is 3%. This difference can be explained by the fact that both tests have important size distortions when the number of explanatory variables is “large”. Thus we perform a final check with the asymptotic tests of Lavergne and Patilea (2008) and Lavergne and Patilea (2012).

    SpeTest( lm_nlin, type = "pala", rejection = "asymptotics", nbeta = 40 )
#>
#>   Lavergne and Patilea (2008) test
#>
#>   Normalized test statistic :  -0.8568
#>   Asymptotic p-value :  0.19578
#> 
    SpeTest( lm_nlin, type = "pala", rejection = "bootstrap" , nboot = 10 , nbeta = 10 )
#>
#>   Lavergne and Patilea (2008) test
#>
#>   Test statistic :  -96.46926
#>   Bootstrap p-value :  0.3
#> `

Both p-values are high so we cannot reject this highly non-linear specification.

# 4 References

H.J. Bierens (1982), “Consistent Model Specification Test”, Journal of Econometrics, 20 (1), 105-134

I. Dreier and S. Kotz (2002), “A note on the characteristic function of the t-distribution”, Statistics & Probability Letters, 57 (3), 221-224

J.C. Escanciano (2006), “A Consistent Diagnostic Test for Regression Models Using Projections”, Econometric Theory, 22 (6), 1030-1051

P.L. Gozalo (1997), “Nonparametric Bootstrap Analysis with Applications to Demographic Effects in Demand Functions”, Journal of Econometrics, 81 (2), 357-393

Johnson, Kotz and Balakrishnan (1995), “Continuous Univariate Distributions”, volume 2, Wiley Series in Probability and Statistics: Applied Probability and Statistics, Wiley & Sons

P. Lavergne and V. Patilea (2008), “Breaking the Curse of Dimensionality in Nonparametric Testing”, Journal of Econometrics, 143 (1), 103-122

P. Lavergne and V. Patilea (2012), “One for All and All for One: Regression Checks with Many Regressors”, Journal of Business & Economic Statistics, 30 (1), 41-52

J.H. Stock and M.W. Watson (2006), “Why Has U.S. Inflation Become Harder to Forecast?”, Journal of Money, Credit and Banking, 39 (1), 3-33

C.F.J. Wu (1986) “Jackknife, bootstrap and other resampling methods in regression analysis (with discussion)”, National Bureau of Economic Research Working Paper

J. Yin, Z. Geng, R. Li, H. Wang (2010), “Nonparametric covariance model”, Statistica Sinica, 20 (1), 469-479

J.X. Zheng (1996), “A Consistent Test of Functional Form via Nonparametric Estimation Techniques”, Journal of Econometrics, 75 (2), 263-289