时间序列分析-模型选择

发布时间 2023-10-13 10:37:34作者: 长歌不采薇

模型选择

Step1. OLS的评估指标及局限性

先来回顾一下常用的估计量:

  • 样本均值:Sample mean of the dependent variable

\[\bar{y}=\frac{1}{T} \sum_{t=1}^{T} y_{t} \]

  • 样本方差的估计量:Sample variance of the dependent variable

\[\widehat{\sigma}_{y}^{2}=\frac{1}{T-1} \sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2} \]

  • 样本标准差的估计量(SD):Sample standard deviation of the dependent variable

\[\widehat{\sigma}_{y}=\sqrt{\frac{1}{T-1} \sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}=S D \]

对于线性回归模型,有一系列

\[y_{t}=\beta_{0}+\beta_{1} x_{1 t}+\beta_{2} x_{2 t}+\cdots+\beta_{k} x_{k t}+\varepsilon_{t} \]

  • SSR:

\[\begin{aligned} S S R & =\sum_{t=1}^{T}\left(y_{t}-\widehat{y}_{t}\right)^{2}=\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2} \\ & =\sum_{t=1}^{T}\left(y_{t}-\widehat{\beta}_{0}-\widehat{\beta}_{1} x_{1 t}-\widehat{\beta}_{2} x_{2 t}-\cdots-\widehat{\beta}_{k} x_{k t}\right)^{2} \end{aligned} \]

  • 回归方差

\[\frac{1}{T-k-1} \sum_{t=1}^{T}\left(y_{t}-\widehat{y}_{t}\right)^{2}=\frac{1}{T-k-1} \sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2} \]

  • 回归标准误(SE):Standard error of the regression

\[S E=\sqrt{\frac{1}{T-k-1} \sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}} \]

  • \(R^{2}\),总会让你选择变量最多的模型:

\[R^{2}=\frac{\sum_{t=1}^{T}\left(\widehat{y}_{t}-\bar{y}\right)^{2}}{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}=1-\frac{\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}} \]

  • Adjusted- \(R^{2}\)基于参数个数提供惩罚项,控制了变量的个数:

\[\text { Adjusted- } R^{2}=1-\frac{\frac{1}{T-k-1} \sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{\frac{1}{T-1} \sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}} \]

如果关注Adjusted- \(R^{2}\)后面那一部分,又可以得到:

  • Akaike Information Criterion (AIC),提供了更大的惩罚项,选择更简单的模型:越小越好

\[A I C=e^{2(k+1) / T} \frac{\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{T} \]

  • Schwarz Information Criterion (SIC),提供了比AIC还大的惩罚项:越小越好

\[S I C=T^{(k+1) / T} \frac{\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{T} \]

上述指标只在比较两个不同的模型时才具有意义。

不同的序列可能对应相同的指标,因此绘制时间序列图等图像是有必要的,能够提供很多模型选择的思路。

画图是一门艺术,需要对每个指标适当调整。

Step2. 模型搭建:趋势项的选择

时间序列的基本组成部分为:

\[y_{t}=\text { Trend }+ \text { Seasonal }+ \text { Cycle } \]

  • Trend is slow, long-run evolution in the variables that we want to model and forecast

  • 确定的向上向下的趋势

\[y_{t}=\beta t+\varepsilon_{t} \]

  • 随机趋势

\[y_{t}=\beta+y_{t-1}+\varepsilon_{t} \]

线性趋势(Linear Trend)

\[y_{t}^{*}=\beta_{0}+\beta_{1} T I M E_{t} \]

虚设一个时间序列(Time Dummy)$$T I M E={1,2,3, \cdots, T-1, T}$$,得到:

\[y_{t}^{*}=\beta_{0}+\beta_{1} t \]

最后的回归模型就是:

\[y_{t}=y_{t}^{*}+\varepsilon_{t}=\beta_{0}+\beta_{1} t+\varepsilon_{t} \]

二次趋势(Quadratic Trend)

\[y_{t}^{*}=\beta_{0}+\beta_{1} T I M E_{t}+\beta_{2}\left(T I M E_{t}\right)^{2}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2} \]

回归模型就是:

\[y_{t}=y_{t}^{*}+\varepsilon_{t}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}+\varepsilon_{t} \]

二次趋势用于提供局部近似值。

指数趋势(Exponential Trend / log-linear trend)

不考虑趋势项,指数趋势可以被写为:

\[y_{t}^{*}=\beta_{0} e^{\beta_{1} t} \]

或者对数线性的形式:

\[\log \left(y_{t}^{*}\right)=\log \left(\beta_{0}\right)+\beta_{1} t=c_{0}+\beta_{1} t \]

  • 回归模型1:指数形式

\[y_{t}=y_{t}^{*}+\varepsilon_{t}=\beta_{0} e^{\beta_{1} t}+\varepsilon_{t} \]

  • 回归模型2:对数-线性形式

\[\log \left(y_{t}\right)=\log \left(y_{t}^{*}\right)+\varepsilon_{t}=c_{0}+\beta_{1} t+\varepsilon_{t} \]

Step3. 趋势项的估计

整体思想是一样的,都是最小化残差平方和。

  • Linear Trend: \(y_{t}=\beta_{0}+\beta_{1} t+\varepsilon_{t}\)

\[\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(y_{t}-\beta_{0}-\beta_{1} t\right)^{2} \]

  • Quadratic Trend: \(y_{t}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}+\varepsilon_{t}\)

\[\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}, \widehat{\beta}_{2}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(y_{t}-\beta_{0}-\beta_{1} t-\beta_{2} t^{2}\right)^{2} \]

  • Exponential Trend: \(y_{t}=\beta_{0} e^{\beta_{1} t}+\varepsilon_{t}\)

\[\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(y_{t}-\beta_{0} e^{\beta_{1} t}\right)^{2} \]

  • Log-linear Trend: \(\log \left(y_{t}\right)=c_{0}+\beta_{1} t+\varepsilon_{t}\)

\[\left(\widehat{c}_{0}, \widehat{\beta}_{1}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(\log \left(y_{t}\right)-c_{0}-\beta_{1} t\right)^{2} \]

什么是对趋势的最优预测?

以线性估计为例: \(y_{t}=\beta_{0}+\beta_{1} t+\varepsilon_{t}\),假设 \(\left\{\varepsilon_{t}\right\} \sim\) iid \(N\left(0, \sigma^{2}\right)\),信息集 \(\Omega_{T}=\left\{\varepsilon_{T}, \varepsilon_{T-1}, \cdots, \varepsilon_{1}, \cdots\right\}\)

\[y_{T+h}=\beta_{0}+\beta_{1}(T+h)+\varepsilon_{T+h} \]

条件期望是最小化预测误差的期望值的最优点估计。\(y_{T+h, T}\) 表示对 \(y_{T+h}\)\(T\) 时刻的点估计,下面来求条件期望:

\[\begin{aligned} y_{T+h, T} & =E\left(y_{T+h} \mid \Omega_{T}\right) \\ & =E\left(\beta_{0}+\beta_{1}(T+h)+\varepsilon_{T+h} \mid \Omega_{T}\right)=\beta_{0}+\beta_{1}(T+h)+E\left(\varepsilon_{T+h} \mid \Omega_{T}\right) \\ & =\beta_{0}+\beta_{1}(T+h) \end{aligned} \]

h阶的预测误差:

\[e_{T+h, T}=y_{T+h}-y_{T+h, T}=\varepsilon_{T+h} \]

如果假设了\(\varepsilon_{t}\)的分布,就可以得到区间预测和分布预测(换句话说点预测不需要知道误差项的分布)。假设\(\left\{\varepsilon_{t}\right\} \sim N\left(0, \sigma^{2}\right)\),得到:

\[y_{T+h} \mid \Omega_{T} \sim N\left(y_{T+h, T}, \sigma^{2}\right) \]

点预测:

\[\widehat{y}_{T+h, T}=\widehat{\beta}_{0}+\widehat{\beta}_{1}(T+h) \]

分布预测:

\[y_{T+h} \mid \Omega_{T} \sim N\left(\widehat{y}_{T+h, T}, \widehat{\sigma}^{2}\right) \]

区间预测(95%,双侧):

\[\left(\widehat{y}_{T+h, T}-1.96 \widehat{\sigma}, \widehat{y}_{T+h, T}+1.96 \widehat{\sigma}\right) \]

注意,预测误差不只有理论上的误差$$\varepsilon_t$$,还包括了参数估计的误差。但是在一般情况下我们只聚焦于理论误差$$\varepsilon_t$$。

\[\begin{aligned} y_{T+h}-\widehat{y}_{T+h, T} & =\left(y_{T+h}-y_{T+h, T}\right)+\left(y_{T+h, T}-\widehat{y}_{T+h, T}\right) \\ & =\varepsilon_{T+h}+\left[\beta_{0}+\beta_{1}(T+h)\right]-\left[\widehat{\beta}_{0}+\widehat{\beta}_{1}(T+h)\right] \\ & =\varepsilon_{T+h}+\left(\beta_{0}-\widehat{\beta}_{0}\right)+\left(\beta_{1}-\widehat{\beta}_{1}\right)(T+h) \end{aligned} \]

假设检验

以一个二次趋势$$y_{t}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}+\varepsilon_{t}$$为例: