# A short introduction to TensorPreAve

library(TensorPreAve)

## Data of tensor time series

This package provides functions to estimate the rank and factor loading spaces of time series tensor factor models. A $$K$$-th order tensor time series can be stored as a ‘Tensor’ object with $$K+1$$ modes, with mode-1 corresponds to time. See package $$rTensor$$ for the operations of ‘Tensor’ objects. As an example, value weighted Fama-French portfolio returns data formed on 10 levels of size and 10 levels of operating profitability can be shaped as a tensor time series, with monthly data from July 1973 to June 2021 so that T = 576. (See Weilin Chen (2023) for more details.) The data tensor thus has size 576 × 10 × 10, and can be stored in the following ‘Tensor’ object.

value_weight_tensor@modes
#>  576  10  10

We also provide a function ‘tensor_data_gen’ to generate random tensor time series based on econometric factor model assumptions. (See Weilin Chen (2023) for more details on the assumptions.)

# Setting tensor dimensions and parameters related to data generation
K = 2                     # The number of modes for the tensor time series
n = 100                   # Length of time series.
d = c(10,10)              # Dimensions of each mode of the tensor
r = c(2,2)                # Rank of core tensors
re = c(2,2)               # Rank of the cross-sectional common error core tensors
eta = list(c(0,0),c(0,0)) # Control factor strengths in each factor loading matrix
u = list(c(-2,2),c(-2,2)) # Control the range of elements in each factor loading matrix

set.seed(10)
Data_test = tensor_data_gen(K,n,d,r,re,eta,u)

X = Data_test$X A = Data_test$A
F_ts = Data_test$F_ts E_ts = Data_test$E_ts

X@modes
#>  100  10  10
F_ts@modes
#>  100   2   2
dim(A[])
#>  10  2

## Pre-averaging procedure to estimate the strongest factor

Given a time series tensor factor model, ‘pre_est’ estimates the factor loading matrices by setting $$z$$ to be the rank of core tensors if it is known.

X = value_weight_tensor
set.seed(10)
Q_PRE_2 = pre_est(X, z = c(2,2))
Q_PRE_2
#> []
#>               [,1]        [,2]
#>  [1,]  0.899541445  0.11899170
#>  [2,]  0.116345254  0.45020724
#>  [3,] -0.003945559  0.55331589
#>  [4,] -0.067410116  0.43390894
#>  [5,] -0.131462950  0.35249401
#>  [6,] -0.172117014  0.28424491
#>  [7,] -0.166848184  0.22955540
#>  [8,] -0.194329983  0.07537761
#>  [9,] -0.187656164  0.02461164
#> [10,] -0.158130363 -0.15712601
#>
#> []
#>              [,1]        [,2]
#>  [1,] -0.49323432  0.65960017
#>  [2,] -0.39909521 -0.03918812
#>  [3,] -0.23058147 -0.28040576
#>  [4,] -0.14495219 -0.35494734
#>  [5,] -0.04484750 -0.35694299
#>  [6,]  0.08343645 -0.24838616
#>  [7,]  0.18881603 -0.08997255
#>  [8,]  0.28671500 -0.06472708
#>  [9,]  0.37141027  0.10650837
#> [10,]  0.50841760  0.38215735

For most cases we do not know the true number of factors. In Weilin Chen (2023), pre-averaging procedure is used to estimate the directions corresponding to the strongest factors by setting $$z$$ to be a vector of ones of length $$K$$, which is also the default choice in the function. The estimated directions will be used to feed the iterative projection procedure in function ‘iter_proj’, which will be shown in the next section.

set.seed(10)
Q_PRE_1 = pre_est(X)
Q_PRE_1
#> []
#>               [,1]
#>  [1,]  0.899541445
#>  [2,]  0.116345254
#>  [3,] -0.003945559
#>  [4,] -0.067410116
#>  [5,] -0.131462950
#>  [6,] -0.172117014
#>  [7,] -0.166848184
#>  [8,] -0.194329983
#>  [9,] -0.187656164
#> [10,] -0.158130363
#>
#> []
#>              [,1]
#>  [1,] -0.49323432
#>  [2,] -0.39909521
#>  [3,] -0.23058147
#>  [4,] -0.14495219
#>  [5,] -0.04484750
#>  [6,]  0.08343645
#>  [7,]  0.18881603
#>  [8,]  0.28671500
#>  [9,]  0.37141027
#> [10,]  0.50841760

The default choice of other parameters in ‘pre_est’ are set to be applied properly in most scenarios and dimensions. For parameter $$eigen_j$$, we provide an alternative way to tune it manually using function ‘pre_eigenplot’, which plots the eigenvalues of the sample covariance matrix for a randomly chosen sample. A large dip should be observed at the ($$r_k+1$$)-th position of the plot, where $$r_k$$ is the true number of factor of mode-$$k$$. Ideally, $$eigen_j$$ should be chosen to be greater than $$r_k$$, so we suggest setting $$eigen_j$$ to be a bit larger than the position of dip observed to avoid missing potential weak factors. If such a dip is not observed, try to run the function for a few times until it can be observed. See Weilin Chen (2023) for more details.

set.seed(7)
pre_eigenplot(X, k = 2) In the above example, a large dip is observed in the 2nd position, but there is also a dip in the 3rd position. To avoid missing potential weak factors, we can choose $$eigen_j$$ to be 3 (or 4), and run ‘pre_est’ using self-defined $$eigen_j$$. In this case, it gives the same result as using the default choice.

set.seed(10)
pre_est(X, eigen_j = c(3,3))
#> []
#>               [,1]
#>  [1,]  0.899541445
#>  [2,]  0.116345254
#>  [3,] -0.003945559
#>  [4,] -0.067410116
#>  [5,] -0.131462950
#>  [6,] -0.172117014
#>  [7,] -0.166848184
#>  [8,] -0.194329983
#>  [9,] -0.187656164
#> [10,] -0.158130363
#>
#> []
#>              [,1]
#>  [1,] -0.49323432
#>  [2,] -0.39909521
#>  [3,] -0.23058147
#>  [4,] -0.14495219
#>  [5,] -0.04484750
#>  [6,]  0.08343645
#>  [7,]  0.18881603
#>  [8,]  0.28671500
#>  [9,]  0.37141027
#> [10,]  0.50841760

## Re-estimation by iterative projection

Running ‘pre_est’ gives the estimated directions corresponding to the strongest factors. Using these directions as the initial directions, we can apply the Algorithm for Iterative Projection Direction Refinement to estimate the factor loading spaces, which can be computed by the function ‘iter_proj’. (See Weilin Chen (2023) for more details of the algorithm.) Similarly, we can set $$z$$ to be the number of factors if it is known.

set.seed(10)
Q_PROJ_2 = iter_proj(X, initial_direction = Q_PRE_1, z = c(2,2))
Q_PROJ_2
#> []
#>             [,1]        [,2]
#>  [1,] -0.4186526  0.81958342
#>  [2,] -0.3034959  0.15608438
#>  [3,] -0.2897870  0.03418513
#>  [4,] -0.2822386 -0.02447667
#>  [5,] -0.2884712 -0.06993895
#>  [6,] -0.2819975 -0.14283911
#>  [7,] -0.3100956 -0.25977356
#>  [8,] -0.2982880 -0.19637954
#>  [9,] -0.3101129 -0.25107977
#> [10,] -0.3534712 -0.32828374
#>
#> []
#>             [,1]        [,2]
#>  [1,] -0.2310763 -0.20931155
#>  [2,] -0.2982651 -0.17583427
#>  [3,] -0.3100153 -0.20709533
#>  [4,] -0.3272582 -0.09977626
#>  [5,] -0.3673419 -0.13197027
#>  [6,] -0.3693822 -0.03268300
#>  [7,] -0.3526410 -0.05158860
#>  [8,] -0.3380189  0.05593774
#>  [9,] -0.3091934  0.18851720
#> [10,] -0.2209928  0.90145090

Similar to the use of ‘pre_est’, if we do not know the number of factors, we can use the default $$z$$, which is a vector of ones of length $$K$$. In this case, ‘iter_proj’ outputs estimated directions of the strongest factors after iterative projection, which can be used to estimate the rank of core tensors by function ‘bs_cor_rank’, as will be introduced in the next section.

set.seed(10)
Q_PROJ_1 = iter_proj(X, initial_direction = Q_PRE_1)
Q_PROJ_1
#> []
#>   -0.4186526 -0.3034959 -0.2897870 -0.2822386 -0.2884712 -0.2819975
#>   -0.3100956 -0.2982880 -0.3101129 -0.3534712
#>
#> []
#>   -0.2310763 -0.2982651 -0.3100153 -0.3272582 -0.3673419 -0.3693822
#>   -0.3526410 -0.3380189 -0.3091934 -0.2209928

## Core tensor rank estimation using projected data

With the estimated directions given by ‘iter_proj’, we can run ‘bs_cor_rank’ to estimate the rank of core tensors by Bootstrapped Correlation Thresholding proposed by Weilin Chen (2023).

set.seed(10)
bs_rank = bs_cor_rank(X, initial_direction = Q_PROJ_1)
bs_rank
#>  2 2

‘bs_cor_rank’ may sometimes take too long to compute if the dimension of the tensor is large. In this case, we can reduce the parameter $$B$$, which is the number of bootstrapped samples. The default value for $$B$$ is 50, and empirically, in most scenarios, reducing $$B$$ from 50 to 10 does not significantly change the results. Other parameters of ‘bs_cor_rank’ are set to be data-driven and should apply well in general.

set.seed(10)
bs_rank = bs_cor_rank(X, initial_direction = Q_PROJ_1, B = 10)
bs_rank
#>  2 2

## The whole rank and factor loading estimations in a single step

To simplify the whole estimation procedure, we provide a unified function ‘rank_factors_est’ to integrate all functions mentioned before, which is able to estimate both the rank of the core tensors and factor loading matrices simultaneously for a tensor time series.

set.seed(10)
results = rank_factors_est(X)
results
#> $rank #>  2 2 #> #>$loadings
#> $loadings[] #> [,1] [,2] #> [1,] -0.4186526 0.81958342 #> [2,] -0.3034959 0.15608438 #> [3,] -0.2897870 0.03418513 #> [4,] -0.2822386 -0.02447667 #> [5,] -0.2884712 -0.06993895 #> [6,] -0.2819975 -0.14283911 #> [7,] -0.3100956 -0.25977356 #> [8,] -0.2982880 -0.19637954 #> [9,] -0.3101129 -0.25107977 #> [10,] -0.3534712 -0.32828374 #> #>$loadings[]
#>             [,1]        [,2]
#>  [1,] -0.2310763 -0.20931155
#>  [2,] -0.2982651 -0.17583427
#>  [3,] -0.3100153 -0.20709533
#>  [4,] -0.3272582 -0.09977626
#>  [5,] -0.3673419 -0.13197027
#>  [6,] -0.3693822 -0.03268300
#>  [7,] -0.3526410 -0.05158860
#>  [8,] -0.3380189  0.05593774
#>  [9,] -0.3091934  0.18851720
#> [10,] -0.2209928  0.90145090

If the rank of core tensors is already known, we can set it as $$input_r$$ in ‘rank_factors_est’, then the function outputs the estimator of factor loading matrices accordingly.

set.seed(10)
results = rank_factors_est(X, input_r = c(2,2))
results
#> $rank #>  2 2 #> #>$loadings
#> $loadings[] #> [,1] [,2] #> [1,] -0.4186526 0.81958342 #> [2,] -0.3034959 0.15608438 #> [3,] -0.2897870 0.03418513 #> [4,] -0.2822386 -0.02447667 #> [5,] -0.2884712 -0.06993895 #> [6,] -0.2819975 -0.14283911 #> [7,] -0.3100956 -0.25977356 #> [8,] -0.2982880 -0.19637954 #> [9,] -0.3101129 -0.25107977 #> [10,] -0.3534712 -0.32828374 #> #>$loadings[]
#>             [,1]        [,2]
#>  [1,] -0.2310763 -0.20931155
#>  [2,] -0.2982651 -0.17583427
#>  [3,] -0.3100153 -0.20709533
#>  [4,] -0.3272582 -0.09977626
#>  [5,] -0.3673419 -0.13197027
#>  [6,] -0.3693822 -0.03268300
#>  [7,] -0.3526410 -0.05158860
#>  [8,] -0.3380189  0.05593774
#>  [9,] -0.3091934  0.18851720
#> [10,] -0.2209928  0.90145090