In this document the utsf package is described. This package offers a meta engine for applying different regression models for univariate time series forecasting using an autoregressive approach.
An univariate time series forecasting method is one in which the future values of a series are predicted using only information from the series, for example, using as forecast its mean historical value. An advantage of this type of prediction is that, apart from the series being forecast, there is no need to collect any further information in order to train the forecasting model.
An autoregressive model is a kind of univariate time series forecasting model in which a value of a time series is expressed as a function of some of its past values. That is, an autoregressive model is a regression model in which the independent variables are lagged values (previous values) of the response variable. For example, given a time series with the following historical values: \(t = \{1, 3, 6, 7, 9, 11, 16\}\), suppose that we want to develop an autoregressive model in which a target “is explained” by its first, second and fourth past values (in this context, a previous value is also called a lag, so lag 1 is the value immediately preceding a given value in the series). Given this series and lags (1, 2 and 4), the training set would be:
Lag 4 | Lag 2 | Lag 1 | Target |
---|---|---|---|
1 | 6 | 7 | 9 |
3 | 7 | 9 | 11 |
6 | 9 | 11 | 16 |
In this model the next future value of the series is predicted as \(f(Lag4, Lag2, Lag1)\), where \(f\) is the regression function and \(Lag4\), \(Lag2\) and \(Lag1\) are the fourth, second and first lagged values of the next future value. So, the next future value of series \(t\) is predicted as \(f(7, 11, 16)\), producing a value that will be called \(F1\).
Suppose that the forecast horizon (the number of future values to be forecast into the future) is greater than 1. In the case that the regression function only predicts the next future value of the series, a recursive approach can be applied to forecast all the future values of the forecast horizon. Using a recursive approach, the regression function is applied recursively until all horizons are forecast. For instance, following the previous example, suppose that the forecast horizon is 3. As we have explained, to forecast the next future value of the series (horizon 1) the regression function is fed with the vector \([7, 11, 16]\), producing \(F1\). To forecast horizon 2 the regression function is fed with the vector \([9, 16, F1]\). The forecast for horizon 1, \(F1\), is used as the first lag for horizon 2 because the actual value is unknown. Finally, to predict horizon 3 the regression function is fed with the vector [\(11, F1, F2]\). This example of recursive forecast is summarized in the following table:
Horizon | Autoregressive values | Forecast |
---|---|---|
1 | 7, 11, 16 | F1 |
2 | 9, 16, F1 | F2 |
3 | 11, F1, F2 | F3 |
The recursive approach for forecasting several values into the future is applied in classical statistical models such as ARIMA or exponential smoothing.
The utsf package makes it easy the use of classical
regression models for univariate time series forecasting employing the
autoregressive approach and the recursive prediction strategy explained
in the previous section. All the supported models are applied using an
uniform interface: the create_model()
function to build the
forecasting model and the forecast()
function for using the
model to predict the future values. Let us see an example in which a
regression tree model is used to forecast the next future values of a
time series:
In this example, an autoregressive tree model
(method = "rt"
) is trained using the historical values of
the AirPassengers
time series and a forecast for its 12
next future values (h = 12
) is done. The
create_model()
function returns an S3 object of class
utsf
with information about the trained model. The
information about the forecast is included in the object returned by the
forecast()
function as a component named pred
of class ts
(a time series):
f$pred
#> Jan Feb Mar Apr May Jun Jul Aug
#> 1961 460.2980 428.4915 467.0304 496.9833 499.9819 554.7891 627.5849 628.0503
#> Sep Oct Nov Dec
#> 1961 533.2803 482.4221 448.7926 453.6920
library(ggplot2)
autoplot(f)
The training set used to fit the model is built from the historical
values of the time series using the autoregressive approach explained in
the previous section. The lags
parameter of the
create_model()
function is used to specify the
autoregressive lags. In the example: lags = 1:12
, so a
target is a function of its 12 previous values. Next, we consult the
first targets (and their associated features) with which the regression
model has been trained:
head(m$targets) # first targets
#> [1] -11.6666667 -0.9166667 13.4166667 6.6666667 -3.8333333 19.8333333
head(m$features) # and its associated features
#> Lag12 Lag11 Lag10 Lag9 Lag8 Lag7 Lag6
#> 1 -14.6666667 -8.666667 5.333333 2.333333 -5.666667 8.333333 21.333333
#> 2 -8.9166667 5.083333 2.083333 -5.916667 8.083333 21.083333 21.083333
#> 3 4.4166667 1.416667 -6.583333 7.416667 20.416667 20.416667 8.416667
#> 4 0.6666667 -7.333333 6.666667 19.666667 19.666667 7.666667 -9.333333
#> 5 -7.8333333 6.166667 19.166667 19.166667 7.166667 -9.833333 -24.833333
#> 6 5.8333333 18.833333 18.833333 6.833333 -10.166667 -25.166667 -11.166667
#> Lag5 Lag4 Lag3 Lag2 Lag1
#> 1 21.333333 9.333333 -7.666667 -22.666667 -8.666667
#> 2 9.083333 -7.916667 -22.916667 -8.916667 -11.916667
#> 3 -8.583333 -23.583333 -9.583333 -12.583333 -1.583333
#> 4 -24.333333 -10.333333 -13.333333 -2.333333 12.666667
#> 5 -10.833333 -13.833333 -2.833333 12.166667 6.166667
#> 6 -14.166667 -3.166667 11.833333 5.833333 -4.166667
Using the example of the previous section:
Prediction intervals for the forecast can be computed:
m <- create_model(USAccDeaths, method = "mt")
f <- forecast(m, h = 12, PI = TRUE, level = 90)
f
#> Point Forecast Lo 90 Hi 90
#> Jan 1979 8162.724 7637.168 8734.505
#> Feb 1979 7185.788 6599.334 7756.071
#> Mar 1979 7475.744 6906.230 8074.220
#> Apr 1979 7836.936 7286.239 8393.013
#> May 1979 8570.632 8009.798 9195.543
#> Jun 1979 9018.691 8506.675 9607.082
#> Jul 1979 9865.224 9301.012 10450.300
#> Aug 1979 9695.409 9121.060 10282.676
#> Sep 1979 9160.569 8573.021 9741.959
#> Oct 1979 8962.662 8390.299 9564.492
#> Nov 1979 8606.452 8016.662 9185.467
#> Dec 1979 8899.133 8330.878 9496.445
library(ggplot2)
autoplot(f)
Prediction intervals are calculated following the guidelines in (https://otexts.com/fpp3/nnetar.html#prediction-intervals-5). Random errors are assumed to follow a normal distribution.
The create_model()
and forecast()
functions
provide a common interface to applying an autoregressive approach for
time series forecasting using different regression models. These models
are implemented in several R packages. Currently, our project is mainly
focused on regression tree models, supporting the following
approaches:
FNN::knn.reg()
is used, as regression function, to
recursively predict the future values of the time series.stats::lm()
and its associated method
stats::predict.lm()
is applied recursively for the
forecasts, i.e., as regression function.rpart::rpart()
and its associated method
rpart::predict.rpart()
is used for the forecasts.Cubist::cubist()
and its associated method
Cubist::predict.cubist()
is used for predictions.ipred::bagging()
and its associated method
ipred::predict.regbagg()
is used for forecasting.ranger::ranger()
and its associated method
ranger::predict.ranger()
is used for predictions.The S3 object of class utsf
returned by the
create_model()
function contains a component with the
trained autoregressive model:
m <- create_model(fdeaths, lags = 1:12, method = "rt")
m$model
#> n= 60
#>
#> node), split, n, deviance, yval
#> * denotes terminal node
#>
#> 1) root 60 1967124.00 -6.465278
#> 2) Lag12< 73 38 212851.90 -124.414500
#> 4) Lag6>=-66.45833 30 57355.07 -153.847200
#> 8) Lag12< -170.7083 10 13293.09 -195.991700 *
#> 9) Lag12>=-170.7083 20 17419.67 -132.775000 *
#> 5) Lag6< -66.45833 8 32051.01 -14.041670 *
#> 3) Lag12>=73 22 312482.00 197.265200
#> 6) Lag5>=-131.7917 7 24738.12 114.500000 *
#> 7) Lag5< -131.7917 15 217416.40 235.888900 *
In this case, the model is the result of training a regression tree
using the function rpart::rpart()
with the training set
consisting of the features m$features
and targets
m$targets
. Once the model is trained, the
rpart::predict.rpart()
function can be used recursively to
forecast the future values of the time series using the
forecast()
function.
An interesting feature of the utsf package is that you can use it to apply your own regression models for time series forecasting in an autoregressive way. Thus, your regression models can benefit from the features implemented in the package, such as pre-processing, parameter tuning, the building of the training set, the implementation of recursive forecasts or the estimation of the forecast accuracy of a model.
To apply your own regression model you have to use the
method
parameter of the create_model()
function, providing as argument a function that is able to train your
model. This function should return an object with the trained regression
model. Also, it must have at least two input parameters:
X
: it is a data frame with the features of the training
examples. This data frame is built from the time series taking into
account the autoregressive lags as explained in a previous section. This
is the same object as the features
component of the object
returned by the create_model()
function.y
: a vector with the targets of the training examples.
It is built as explained in a previous section. It is the same object as
the targets
component of the object returned by the
create_model()
function.Furthermore, if the function that trains the model (the function
provided for the method
parameter) returns a model of class
model_class
, a method with the signature
predict.model_class(object, new_value)
should be
implemented. This method uses your model to predict a new value, that
is, it is the regression function associated with the model.
Let us see an example in which a time series is forecast using the k-nearest neighbors regression model implemented in the package FNN:
# Function to train the regression model
my_knn_model <- function(X, y, k = 3) {
structure(list(X = X, y = y, k = k), class = "my_knn")
}
# Function to predict a new example
predict.my_knn <- function(object, new_value) {
FNN::knn.reg(train = object$X, test = new_value,
y = object$y, k = object$k)$pred
}
m <- create_model(AirPassengers, lags = 1:12, method = my_knn_model)
f <- forecast(m, h = 12)
f$pred
#> Jan Feb Mar Apr May Jun Jul Aug
#> 1961 455.9167 434.3264 480.7703 490.1678 506.1262 568.0534 640.6689 640.8636
#> Sep Oct Nov Dec
#> 1961 549.5467 495.4255 441.6554 476.7934
autoplot(f)
The new_value
parameter of the predict
method receives a data frame with the same structure as the
X
parameter of the function for building the model. The
new_value
data frame has only one row, with the features of
the example to be predicted.
The k-nearest neighbors algorithm is so simple that it can be easily implemented without using functionality from any R package:
# Function to train the regression model
my_knn_model2 <- function(X, y, k = 3) {
structure(list(X = X, y = y, k = k), class = "my_knn2")
}
# Function to predict a new example
predict.my_knn2 <- function(object, new_value) {
distances <- sapply(1:nrow(object$X), function(i) sum((object$X[i, ] - new_value)^2))
k_nearest <- order(distances)[1:object$k]
mean(object$y[k_nearest])
}
m2 <- create_model(AirPassengers, lags = 1:12, method = my_knn_model2)
forecast(m2, h = 12)$pred
#> Jan Feb Mar Apr May Jun Jul Aug
#> 1961 455.9167 434.3264 480.7703 490.1678 506.1262 568.0534 640.6689 640.8636
#> Sep Oct Nov Dec
#> 1961 549.5467 495.4255 441.6554 476.7934
Normally, a regression model can be adjusted using different
parameters. By default, the models supported by our package are set
using some specific parameters, usually the default values of the
functions used to train the models (these functions are listed in a
previous section). However, the user can set the parameters used to
train the regression models with the param
argument of the
create_model()
function. The param
argument
must be a list with the names and values of the parameters to be set.
Let us see an example:
# A bagging model set with default parameters
m <- create_model(AirPassengers, lags = 1:12, method = "bagging")
length(m$model$mtrees) # number of regression trees (25 by default)
#> [1] 25
# A bagging model set with 3 regression tress
m2 <- create_model(AirPassengers,
lags = 1:12,
method = "bagging",
param = list(nbagg = 3)
)
length(m2$model$mtrees) # number of regression trees
#> [1] 3
In the previous example, two bagging models (using regression trees)
are trained with the create_model()
function. In the first
model the number of trees is 25, the default value of the function
ipred::ipredbagg()
used to train the model. In the second
model the number of trees is set to 3. Of course, in order to set some
specific parameters the user must consult the arguments of the function
used internally by the create_model()
function to train the
model. In the example, ipred::ipredbagg()
.
In the following example the user sets the parameters of a regression model implemented by himself/herself:
# Function to train the model
my_knn_model <- function(X, y, k = 3) {
structure(list(X = X, y = y, k = k), class = "my_knn")
}
# Regression function for object of class my_knn
predict.my_knn <- function(object, new_value) {
FNN::knn.reg(train = object$X, test = new_value,
y = object$y, k = object$k)$pred
}
# The model is trained with default parameters (k = 3)
m <- create_model(AirPassengers, lags = 1:12, method = my_knn_model)
print(m$model$k)
#> [1] 3
# The model is trained with k = 5
m2 <- create_model(AirPassengers, method = my_knn_model, param = list(k = 5))
print(m2$model$k)
#> [1] 5
This section explains how to estimate the forecast accuracy of a regression model predicting a time series with the utsf package. Let us see an example:
m <- create_model(UKgas, lags = 1:4, method = "knn")
r <- efa(m, h = 4, type = "normal", size = 8)
r$per_horizon
#> Horizon 1 Horizon 2 Horizon 3 Horizon 4
#> MAE 40.956667 37.205000 42.730417 51.424167
#> MAPE 4.118140 5.030128 7.468098 7.434864
#> sMAPE 4.261377 5.208551 7.814417 7.732037
#> RMSE 58.920772 47.660996 49.733955 55.023038
r$global
#> MAE MAPE sMAPE RMSE
#> 43.079063 6.012808 6.254095 52.834690
To assess the forecast accuracy of a forecasting model you should use
the efa()
function on the model. In this case, the
forecasting method is a k-nearest neighbors algorithm
(method = "knn"
) with the autoregressive lags 1 to 4
applied on the UKgas
time series. An estimation of its
forecast accuracy on the series for a forecasting horizon of 4
(h = 4
) is obtained according to different forecast
accuracy measures. The efa()
function returns a list with
two components. The component named per_horizon
contains
the forecast accuracy estimations per forecasting horizon. The component
named global
is computed as the means of the rows of the
per_horizon
component. Currently, the following forecast
accuracy measures are computed:
Next, we describe how the forecasting accuracy measures are computed for a forecasting horizon \(h\) (\(y_t\) and \(\hat{y}_t\) are the actual future value and its forecast for horizon \(t\) respectively):
\[ MAE = \frac{1}{h}\sum_{t=1}^{h} |y_t-\hat{y}_t| \]
\[ MAPE = \frac{1}{h}\sum_{t=1}^{h} 100\frac{|y_t-\hat{y}_t|}{y_t} \] \[ sMAPE = \frac{1}{h}\sum_{t=1}^{h} 200\frac{\left|y_{t}-\hat{y}_{t}\right|}{|y_t|+|\hat{y}_t|} \]
\[ RMSE = \sqrt{\frac{1}{h}\sum_{t=1}^{h} (y_t-\hat{y}_t)^2} \]
The estimation of forecast accuracy is done with the well-known rolling origin evaluation. This technique builds several training and test sets from the series, as shown below for a forecasting horizon of 4:
In this case the last nine observations of the series are usedto build the test sets. Six models are trained using the different training sets and their forecasts are compared to their corresponding test sets. Thus, the estimation of the forecast error for every forecasting horizon is based on six errors (it is computed as the mean).
You can specify how many of the last observations of the series are
used to build the test sets with the size
and
prop
parameters. For example:
m <- create_model(UKgas, lags = 1:4, method = "knn")
# Use the last 9 observations of the series to build test sets
r1 <- efa(m, h = 4, type = "normal", size = 9)
# Use the last 20% observations of the series to build test sets
r2 <- efa(m, h = 4, type = "normal", prop = 0.2)
If the series is short, the training sets might be too small to fit a
meaningful model. In that case, a special rolling origin evaluation can
be done with the type
parameter set to
minimum
:
m <- create_model(UKgas, lags = 1:4, method = "knn")
r <- efa(m, h = 4, type = "minimum")
r$per_horizon
#> Horizon 1 Horizon 2 Horizon 3 Horizon 4
#> MAE 47.893750 44.934722 44.918229 26.961198
#> MAPE 5.745458 6.757300 8.250563 3.444200
#> sMAPE 5.761843 6.791762 8.329877 3.385892
#> RMSE 56.558584 52.858868 46.912818 26.961198
r$global
#> MAE MAPE sMAPE RMSE
#> 41.176975 6.049380 6.067343 45.822867
When this option is chosen the last h
observations (in
the example 4) are used to build the training sets as is shown below for
h = 4
:
In this case the estimated forcast error for horizon 1 is based on 4 errors, for horizon 2 is based on 3 errors, for horizon 3 on 2 errors and for horizon 4 on 1 error.
The efa() function also returns the test sets (and the predictions associated with them) used when assessing the forecast accuracy:
timeS <- ts(1:25)
m <- create_model(timeS, lags = 1:3, method = "mt")
r <- efa(m, h = 5, size = 7)
r$test_sets
#> h=1 h=2 h=3 h=4 h=5
#> [1,] 19 20 21 22 23
#> [2,] 20 21 22 23 24
#> [3,] 21 22 23 24 25
r$predictions
#> h=1 h=2 h=3 h=4 h=5
#> [1,] 19 20 21 22 23
#> [2,] 20 21 22 23 24
#> [3,] 21 22 23 24 25
Each row in these tables represent a test set or prediction. Let us see the test sets when the “minimum” evaluation is done:
Another useful feature of the utsf package is
parameter tuning. The tune_grid()
function allows you to
estimate the forecast accuracy of a model using different combinations
of parameters. Furthermore, the best combination of parameters is used
to train a model with all the historical values of the series and the
model is applied for forecasting the future values of the series. Let us
see an example:
m <- create_model(UKgas, lags = 1:4, method = "knn")
r <- tune_grid(m, h = 4, tuneGrid = expand.grid(k = 1:7), type = "normal", size = 8)
# To see the estimated forecast accuracy with the different configurations
r$tuneGrid
#> k MAE MAPE sMAPE RMSE
#> 1 1 27.50551 4.195273 4.344649 38.72080
#> 2 2 41.67751 5.782108 5.995799 50.67769
#> 3 3 43.07906 6.012808 6.254095 52.83469
#> 4 4 43.58916 6.206692 6.382526 55.14836
#> 5 5 46.03185 6.332209 6.503322 58.41501
#> 6 6 47.64410 6.441446 6.583134 62.46201
#> 7 7 48.95917 6.841679 6.886563 63.75997
In this example, the tuneGrid
parameter is used to
specify (using a data frame) the set of parameters to assess. The
forecast accuracy of the model for the different combinations of
parameters is estimated as explained in the previous section using the
last observations of the time series as validation set. The
size
parameter is used to specify the size of the test set.
The tuneGrid
component of the list returned by the
tune_grid()
function contains the result of the estimation.
In this case, the k-nearest neighbors method using \(k=1\) obtains the best results for all the
forecast accuracy measures. The best combination of parameters according
to RMSE is used to forecast the time series:
r$best
#> $k
#> [1] 1
r$forecast
#> Qtr1 Qtr2 Qtr3 Qtr4
#> 1987 1217.9250 661.4063 388.1828 817.3785
Let us plot the values of \(k\) against their estimated accuracy using RMSE:
plot(r$tuneGrid$k, r$tuneGrid$RMSE, type = "o", pch = 19, xlab = "k (number of nearest neighbors)", ylab = "RMSE", main = "Estimated accuracy")
Sometimes, the forecast accuracy of a model can be improved by applying some transformations and/or preprocessings to the time series being forecast. Currently, the utsf package is focused on preprocessings related to forecasting time series with a trend. This is due to the fact that, at present, the models incorporated in our package (such as regression trees and k-nearest neighbors) predict averages of the targets in the training set (i.e., an average of historical values of the series). In a trended series these averages are probably outside the range of the future values of the series. Let us see an example in which a trended series (the yearly revenue passenger miles flown by commercial airlines in the United States) is forecast using a random forest model:
m <- create_model(airmiles, lags = 1:4, method = "rf", preProcess = list(trend("none")))
f <- forecast(m, h = 4)
autoplot(f)
In this case, no preprocessing is applied
(preProcess = list(trend("none"))
). It can be observed how
the forecast does not capture the trending behavior of the series
because regression tree models predict averaged values of the training
targets.
In the next subsections we explain how to deal with trended series using three different transformations.
A common way of dealing with trended series is to transform the original series taking first differences (computing the differences between consecutive observations). Then, the model is trained with the differenced series and the forecasts are back-transformed. Let us see an example:
m <- create_model(airmiles, lags = 1:4, method = "rf", preProcess = list(trend("differences", 1)))
f <- forecast(m, h = 4)
autoplot(f)
In this case, the preProcess
parameter has been used to
specify first-order differences. The preProcess
parameter
consists of a list of preprocessings or transformations. Currently, only
one preprocessing can be specified. The order of first differences is
specified with the second parameter of the trend()
function
(normally 1). It is also possible to specify the value -1, in which case
the order of differences is estimated using the ndiffs()
function from the forecast package (in that case the
chosen order of first differences could be 0, that is, no differences
are applied).
This transformation has been used to deal with trending series in other packages, such as tsknn and tsfgrnn. The additive transformation works transforming the training examples as follows:
It is easy to check how the training examples are modified by the additive transformation using the API of the package. For example, let us see the training examples of a model with no transformation:
timeS <- ts(c(1, 3, 7, 9, 10, 12))
m <- create_model(timeS, lags = 1:2, preProcess = list(trend("none")))
cbind(m$features, Targets = m$targets)
#> Lag2 Lag1 Targets
#> 1 1 3 7
#> 2 3 7 9
#> 3 7 9 10
#> 4 9 10 12
Now, let us see the effect of the additive transformation in the training examples:
timeS <- ts(c(1, 3, 7, 9, 10, 12))
m <- create_model(timeS, lags = 1:2, preProcess = list(trend("additive")))
cbind(m$features, Targets = m$targets)
#> Lag2 Lag1 Targets
#> 1 -1.0 1.0 5.0
#> 2 -2.0 2.0 4.0
#> 3 -1.0 1.0 2.0
#> 4 -0.5 0.5 2.5
Finally, we forecast the airmiles series using the additive transformation and a random forest model:
In this last code snippet we have used the default value of
preProcess
, because it applies the additive
transformation.
The multiplicative transformation is similar to the additive transformation, but it is intended for series with an exponential trend (the additive transformation is suited for series with a linear trend). In the multiplicative transformation a training example is transformed this way:
Let us see an example of an artificial time series with a multiplicative trend and its forecast using the additive and the multiplicative transformation:
t <- ts(10 * 1.05^(1:20))
m_m <- create_model(t, lags = 1:3, method = "rf", preProcess = list(trend("multiplicative")))
f_m <- forecast(m_m, h = 4)
m_a <- create_model(t, lags = 1:3, method = "rf", preProcess = list(trend("additive")))
f_a <- forecast(m_a, h = 4)
library(vctsfr)
plot_predictions(t, predictions = list(Multiplicative = f_m$pred, Additive = f_a$pred))
In this case, the forecast with the multiplicative transformation captures perfectly the exponential trend.
The create_model()
function only has one compulsory
parameters: the time series with which the model is trained. Next, we
discuss the default values for the rest of its parameters:
lags
: The lags
parameter is an integer
vector (in increasing order) with the autoregressive lags. If
frequency(ts) == f
where ts
is the time series
being forecast and \(f > 1\) then
the lags used as autoregressive features are 1:f. For example,
the lags for quarterly data are 1:4 and for monthly data 1:12. This is
useful to capture a possible seasonal behavior of the series. If
frequency(ts) == 1
, then:
stats::pacf()
) are selected.method
: By default, the k-nearest neighbors algorithm
is applied.param
: By default, the model is trained using some
sensible parameters, normally the default values of the function used to
train the model.preProcess
: The additive transformation is applied.
This transformation have proved its effectiveness to forecast trending
series. In general, its application seems beneficial on any kind of
series.