# Getting Started with NNS: Clustering and Regression

library(NNS)
library(data.table)
require(knitr)
require(rgl)
require(meboot)
require(tdigest)
require(dtw)

# Clustering and Regression

Below are some examples demonstrating unsupervised learning with NNS clustering and nonlinear regression using the resulting clusters. As always, for a more thorough description and definition, please view the References.

## NNS Partitioning NNS.part()

NNS.part is both a partitional and hierarchical clustering method. NNS iteratively partitions the joint distribution into partial moment quadrants, and then assigns a quadrant identification (1:4) at each partition.

NNS.part returns a data.table of observations along with their final quadrant identification. It also returns the regression points, which are the quadrant means used in NNS.reg.

x = seq(-5, 5, .05); y = x ^ 3

for(i in 1 : 4){NNS.part(x, y, order = i, Voronoi = TRUE, obs.req = 0)}

### X-only Partitioning

NNS.part offers a partitioning based on $$x$$ values only NNS.part(x, y, type = "XONLY", ...), using the entire bandwidth in its regression point derivation, and shares the same limit condition as partitioning via both $$x$$ and $$y$$ values.

for(i in 1 : 4){NNS.part(x, y, order = i, type = "XONLY", Voronoi = TRUE)}

Note the partition identifications are limited to 1’s and 2’s (left and right of the partition respectively), not the 4 values per the $$x$$ and $$y$$ partitioning.

## $order ## [1] 4 ## ##$dt
##   1: -5.00 -125.0000    q1111           q111
##   2: -4.95 -121.2874    q1111           q111
##   3: -4.90 -117.6490    q1111           q111
##   4: -4.85 -114.0841    q1111           q111
##   5: -4.80 -110.5920    q1111           q111
##  ---
## 197:  4.80  110.5920    q2222           q222
## 198:  4.85  114.0841    q2222           q222
## 199:  4.90  117.6490    q2222           q222
## 200:  4.95  121.2874    q2222           q222
## 201:  5.00  125.0000    q2222           q222
##
## $regression.points ## quadrant x y ## 1: q111 -4.3733596 -79.5965469 ## 2: q112 -3.0984252 -27.8292362 ## 3: q121 -1.8484252 -5.7529911 ## 4: q122 -0.5984252 -0.2162835 ## 5: q211 0.6515748 0.2682589 ## 6: q212 1.9015748 6.2296142 ## 7: q221 3.1515748 29.2102884 ## 8: q222 4.3984252 81.2340157 ## Clusters Used in Regression The right column of plots shows the corresponding regression for the order of NNS partitioning. for(i in 1 : 3){NNS.part(x, y, order = i, obs.req = 0, Voronoi = TRUE) ; NNS.reg(x, y, order = i, ncores = 1)} # NNS Regression NNS.reg() NNS.reg can fit any $$f(x)$$, for both uni- and multivariate cases. NNS.reg returns a self-evident list of values provided below. ## Univariate: NNS.reg(x, y, ncores = 1) ##$R2
## [1] 1
##
## $SE ## [1] 0 ## ##$Prediction.Accuracy
## NULL
##
## $equation ## NULL ## ##$x.star
## NULL
##
## $derivative ## Coefficient X.Lower.Range X.Upper.Range ## 1: 74.2525 -5.00 -4.95 ## 2: 72.7675 -4.95 -4.90 ## 3: 71.2975 -4.90 -4.85 ## 4: 69.8425 -4.85 -4.80 ## 5: 68.4025 -4.80 -4.75 ## --- ## 196: 68.4025 4.75 4.80 ## 197: 69.8425 4.80 4.85 ## 198: 71.2975 4.85 4.90 ## 199: 72.7675 4.90 4.95 ## 200: 74.2525 4.95 5.00 ## ##$Point.est
## NULL
##
## $regression.points ## x y ## 1: -5.00 -125.0000 ## 2: -4.95 -121.2874 ## 3: -4.90 -117.6490 ## 4: -4.85 -114.0841 ## 5: -4.80 -110.5920 ## --- ## 197: 4.80 110.5920 ## 198: 4.85 114.0841 ## 199: 4.90 117.6490 ## 200: 4.95 121.2874 ## 201: 5.00 125.0000 ## ##$Fitted.xy
##          x         y     y.hat      NNS.ID gradient residuals
##   1: -5.00 -125.0000 -125.0000 q4444444444  74.2525         0
##   2: -4.95 -121.2874 -121.2874 q4444441444  72.7675         0
##   3: -4.90 -117.6490 -117.6490 q4444432222  71.2975         0
##   4: -4.85 -114.0841 -114.0841 q4444414444  69.8425         0
##   5: -4.80 -110.5920 -110.5920 q4444411444  68.4025         0
##  ---
## 197:  4.80  110.5920  110.5920 q1111142444  69.8425         0
## 198:  4.85  114.0841  114.0841 q1111141444  71.2975         0
## 199:  4.90  117.6490  117.6490 q1111114444  72.7675         0
## 200:  4.95  121.2874  121.2874 q1111112222  74.2525         0
## 201:  5.00  125.0000  125.0000 q1111111444  74.2525         0

## Multivariate:

Multivariate regressions return a plot of $$y$$ and $$\hat{y}$$, as well as the regression points ($RPM) and partitions ($rhs.partitions) for each regressor.

f= function(x, y) x ^ 3 + 3 * y - y ^ 3 - 3 * x
y = x ; z = expand.grid(x, y)
g = f(z[ , 1], z[ , 2])
NNS.reg(z, g, order = "max", ncores = 1)

## $R2 ## [1] 1 ## ##$rhs.partitions
##         Var1 Var2
##     1: -5.00   -5
##     2: -4.95   -5
##     3: -4.90   -5
##     4: -4.85   -5
##     5: -4.80   -5
##    ---
## 40397:  4.80    5
## 40398:  4.85    5
## 40399:  4.90    5
## 40400:  4.95    5
## 40401:  5.00    5
##
## $RPM ## Var1 Var2 y.hat ## 1: -4.8 -4.80 -7.105427e-15 ## 2: -4.8 -2.55 -8.726063e+01 ## 3: -4.8 -2.50 -8.806700e+01 ## 4: -4.8 -2.45 -8.883587e+01 ## 5: -4.8 -2.40 -8.956800e+01 ## --- ## 40397: -2.6 -2.80 3.776000e+00 ## 40398: -2.6 -2.75 2.770875e+00 ## 40399: -2.6 -2.70 1.807000e+00 ## 40400: -2.6 -2.65 8.836250e-01 ## 40401: -2.6 -2.60 1.776357e-15 ## ##$Point.est
## NULL
##

## NNS Dimension Reduction Regression

NNS.reg also provides a dimension reduction regression by including a parameter NNS.reg(x, y, dim.red.method = "cor", ...). Reducing all regressors to a single dimension using the returned equation NNS.reg(..., dim.red.method = "cor", ...)$equation. NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", location = "topleft", ncores = 1)$equation

##        Variable Coefficient
## 1: Sepal.Length   0.7980781
## 2:  Sepal.Width  -0.4402896
## 3: Petal.Length   0.9354305
## 4:  Petal.Width   0.9381792
## 5:  DENOMINATOR   4.0000000

Thus, our model for this regression would be: $Species = \frac{0.798*Sepal.Length -0.44*Sepal.Width +0.935*Petal.Length +0.938*Petal.Width}{4}$

### Threshold

NNS.reg(x, y, dim.red.method = "cor", threshold = ...) offers a method of reducing regressors further by controlling the absolute value of required correlation.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, location = "topleft", ncores = 1)$equation ## Variable Coefficient ## 1: Sepal.Length 0.7980781 ## 2: Sepal.Width 0.0000000 ## 3: Petal.Length 0.9354305 ## 4: Petal.Width 0.9381792 ## 5: DENOMINATOR 3.0000000 Thus, our model for this further reduced dimension regression would be: $Species = \frac{\: 0.798*Sepal.Length + 0*Sepal.Width +0.935*Petal.Length +0.938*Petal.Width}{3}$ and the point.est = (...) operates in the same manner as the full regression above, again called with NNS.reg(...)$Point.est.

NNS.reg(iris[ , 1 : 4], iris[ , 5], dim.red.method = "cor", threshold = .75, point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est ## [1] 1 1 1 1 1 1 1 1 1 1 # Classification For a classification problem, we simply set NNS.reg(x, y, type = "CLASS", ...). NOTE: Base category of response variable should be 1, not 0 for classification problems. NNS.reg(iris[ , 1 : 4], iris[ , 5], type = "CLASS", point.est = iris[1 : 10, 1 : 4], location = "topleft", ncores = 1)$Point.est

##  [1] 1 1 1 1 1 1 1 1 1 1

# Cross-Validation NNS.stack()

The NNS.stack routine cross-validates for a given objective function the n.best parameter in the multivariate NNS.reg function as well as the threshold parameter in the dimension reduction NNS.reg version. NNS.stack can be used for classification:

NNS.stack(..., type = "CLASS", ...)

or continuous dependent variables:

NNS.stack(..., type = NULL, ...).

Any objective function obj.fn can be called using expression() with the terms predicted and actual.

NNS.stack(IVs.train = iris[ , 1 : 4],
DV.train = iris[ , 5],
IVs.test = iris[1 : 10, 1 : 4],
obj.fn = expression( mean(round(predicted) == actual) ),
objective = "max", type = "CLASS",
folds = 1, ncores = 1)
## $OBJfn.reg ## [1] 0.9733333 ## ##$NNS.reg.n.best
## [1] 1
##
## $probability.threshold ## [1] 0.43875 ## ##$OBJfn.dim.red
## [1] 0.9533333
##
## $NNS.dim.red.threshold ## [1] 0.78 ## ##$reg
##  [1] 1 1 1 1 1 1 1 1 1 1
##
## $dim.red ## [1] 1 1 1 1 1 1 1 1 1 1 ## ##$stack
##  [1] 1 1 1 1 1 1 1 1 1 1

## Increasing Dimensions

Given multicollinearity is not an issue for nonparametric regressions as it is for OLS, in the case of an ill-fit univariate model a better option may be to increase the dimensionality of regressors with a copy of itself and cross-validate the number of clusters n.best via:

NNS.stack(IVs.train = cbind(x, x), DV.train = y, method = 1, ...).

set.seed(123)
x <- rnorm(100); y <- rnorm(100)

nns.params <- NNS.stack(IVs.train = cbind(x, x),
DV.train = y,
method = 1, ncores = 1)

NNS.reg(cbind(x, x), y,
n.best = nns.params\$NNS.reg.n.best,
point.est = cbind(x, x), ncores = 1)

# References

If the user is so motivated, detailed arguments further examples are provided within the following: