模型选择
Step1. OLS的评估指标及局限性
先来回顾一下常用的估计量:
- 样本均值:Sample mean of the dependent variable
\[\bar{y}=\frac{1}{T} \sum_{t=1}^{T} y_{t}
\]
- 样本方差的估计量:Sample variance of the dependent variable
\[\widehat{\sigma}_{y}^{2}=\frac{1}{T-1} \sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}
\]
- 样本标准差的估计量(SD):Sample standard deviation of the dependent variable
\[\widehat{\sigma}_{y}=\sqrt{\frac{1}{T-1} \sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}=S D
\]
对于线性回归模型,有一系列
\[y_{t}=\beta_{0}+\beta_{1} x_{1 t}+\beta_{2} x_{2 t}+\cdots+\beta_{k} x_{k t}+\varepsilon_{t}
\]
\[\begin{aligned}
S S R & =\sum_{t=1}^{T}\left(y_{t}-\widehat{y}_{t}\right)^{2}=\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2} \\
& =\sum_{t=1}^{T}\left(y_{t}-\widehat{\beta}_{0}-\widehat{\beta}_{1} x_{1 t}-\widehat{\beta}_{2} x_{2 t}-\cdots-\widehat{\beta}_{k} x_{k t}\right)^{2}
\end{aligned}
\]
\[\frac{1}{T-k-1} \sum_{t=1}^{T}\left(y_{t}-\widehat{y}_{t}\right)^{2}=\frac{1}{T-k-1} \sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}
\]
- 回归标准误(SE):Standard error of the regression
\[S E=\sqrt{\frac{1}{T-k-1} \sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}
\]
\[R^{2}=\frac{\sum_{t=1}^{T}\left(\widehat{y}_{t}-\bar{y}\right)^{2}}{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}=1-\frac{\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{\sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}
\]
- Adjusted- \(R^{2}\) ,基于参数个数提供惩罚项,控制了变量的个数:
\[\text { Adjusted- } R^{2}=1-\frac{\frac{1}{T-k-1} \sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{\frac{1}{T-1} \sum_{t=1}^{T}\left(y_{t}-\bar{y}\right)^{2}}
\]
如果关注Adjusted- \(R^{2}\)后面那一部分,又可以得到:
- Akaike Information Criterion (AIC),提供了更大的惩罚项,选择更简单的模型:越小越好
\[A I C=e^{2(k+1) / T} \frac{\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{T}
\]
- Schwarz Information Criterion (SIC),提供了比AIC还大的惩罚项:越小越好
\[S I C=T^{(k+1) / T} \frac{\sum_{t=1}^{T} \widehat{\varepsilon}_{t}^{2}}{T}
\]
上述指标只在比较两个不同的模型时才具有意义。
不同的序列可能对应相同的指标,因此绘制时间序列图等图像是有必要的,能够提供很多模型选择的思路。
画图是一门艺术,需要对每个指标适当调整。
Step2. 模型搭建:趋势项的选择
时间序列的基本组成部分为:
\[y_{t}=\text { Trend }+ \text { Seasonal }+ \text { Cycle }
\]
\[y_{t}=\beta t+\varepsilon_{t}
\]
\[y_{t}=\beta+y_{t-1}+\varepsilon_{t}
\]
线性趋势(Linear Trend)
\[y_{t}^{*}=\beta_{0}+\beta_{1} T I M E_{t}
\]
虚设一个时间序列(Time Dummy)$$T I M E={1,2,3, \cdots, T-1, T}$$,得到:
\[y_{t}^{*}=\beta_{0}+\beta_{1} t
\]
最后的回归模型就是:
\[y_{t}=y_{t}^{*}+\varepsilon_{t}=\beta_{0}+\beta_{1} t+\varepsilon_{t}
\]
二次趋势(Quadratic Trend)
\[y_{t}^{*}=\beta_{0}+\beta_{1} T I M E_{t}+\beta_{2}\left(T I M E_{t}\right)^{2}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}
\]
回归模型就是:
\[y_{t}=y_{t}^{*}+\varepsilon_{t}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}+\varepsilon_{t}
\]
二次趋势用于提供局部近似值。
指数趋势(Exponential Trend / log-linear trend)
不考虑趋势项,指数趋势可以被写为:
\[y_{t}^{*}=\beta_{0} e^{\beta_{1} t}
\]
或者对数线性的形式:
\[\log \left(y_{t}^{*}\right)=\log \left(\beta_{0}\right)+\beta_{1} t=c_{0}+\beta_{1} t
\]
\[y_{t}=y_{t}^{*}+\varepsilon_{t}=\beta_{0} e^{\beta_{1} t}+\varepsilon_{t}
\]
\[\log \left(y_{t}\right)=\log \left(y_{t}^{*}\right)+\varepsilon_{t}=c_{0}+\beta_{1} t+\varepsilon_{t}
\]
Step3. 趋势项的估计
整体思想是一样的,都是最小化残差平方和。
- Linear Trend: \(y_{t}=\beta_{0}+\beta_{1} t+\varepsilon_{t}\)
\[\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(y_{t}-\beta_{0}-\beta_{1} t\right)^{2}
\]
- Quadratic Trend: \(y_{t}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}+\varepsilon_{t}\)
\[\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}, \widehat{\beta}_{2}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(y_{t}-\beta_{0}-\beta_{1} t-\beta_{2} t^{2}\right)^{2}
\]
- Exponential Trend: \(y_{t}=\beta_{0} e^{\beta_{1} t}+\varepsilon_{t}\)
\[\left(\widehat{\beta}_{0}, \widehat{\beta}_{1}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(y_{t}-\beta_{0} e^{\beta_{1} t}\right)^{2}
\]
- Log-linear Trend: \(\log \left(y_{t}\right)=c_{0}+\beta_{1} t+\varepsilon_{t}\)
\[\left(\widehat{c}_{0}, \widehat{\beta}_{1}\right)_{L S}=\arg \min \sum_{t=1}^{T}\left(\log \left(y_{t}\right)-c_{0}-\beta_{1} t\right)^{2}
\]
什么是对趋势的最优预测?
以线性估计为例: \(y_{t}=\beta_{0}+\beta_{1} t+\varepsilon_{t}\),假设 \(\left\{\varepsilon_{t}\right\} \sim\) iid \(N\left(0, \sigma^{2}\right)\),信息集 \(\Omega_{T}=\left\{\varepsilon_{T}, \varepsilon_{T-1}, \cdots, \varepsilon_{1}, \cdots\right\}\)
\[y_{T+h}=\beta_{0}+\beta_{1}(T+h)+\varepsilon_{T+h}
\]
条件期望是最小化预测误差的期望值的最优点估计。用 \(y_{T+h, T}\) 表示对 \(y_{T+h}\) 在 \(T\) 时刻的点估计,下面来求条件期望:
\[\begin{aligned}
y_{T+h, T} & =E\left(y_{T+h} \mid \Omega_{T}\right) \\
& =E\left(\beta_{0}+\beta_{1}(T+h)+\varepsilon_{T+h} \mid \Omega_{T}\right)=\beta_{0}+\beta_{1}(T+h)+E\left(\varepsilon_{T+h} \mid \Omega_{T}\right) \\
& =\beta_{0}+\beta_{1}(T+h)
\end{aligned}
\]
h阶的预测误差:
\[e_{T+h, T}=y_{T+h}-y_{T+h, T}=\varepsilon_{T+h}
\]
如果假设了\(\varepsilon_{t}\)的分布,就可以得到区间预测和分布预测(换句话说点预测不需要知道误差项的分布)。假设\(\left\{\varepsilon_{t}\right\} \sim N\left(0, \sigma^{2}\right)\),得到:
\[y_{T+h} \mid \Omega_{T} \sim N\left(y_{T+h, T}, \sigma^{2}\right)
\]
点预测:
\[\widehat{y}_{T+h, T}=\widehat{\beta}_{0}+\widehat{\beta}_{1}(T+h)
\]
分布预测:
\[y_{T+h} \mid \Omega_{T} \sim N\left(\widehat{y}_{T+h, T}, \widehat{\sigma}^{2}\right)
\]
区间预测(95%,双侧):
\[\left(\widehat{y}_{T+h, T}-1.96 \widehat{\sigma}, \widehat{y}_{T+h, T}+1.96 \widehat{\sigma}\right)
\]
注意,预测误差不只有理论上的误差$$\varepsilon_t$$,还包括了参数估计的误差。但是在一般情况下我们只聚焦于理论误差$$\varepsilon_t$$。
\[\begin{aligned}
y_{T+h}-\widehat{y}_{T+h, T} & =\left(y_{T+h}-y_{T+h, T}\right)+\left(y_{T+h, T}-\widehat{y}_{T+h, T}\right) \\
& =\varepsilon_{T+h}+\left[\beta_{0}+\beta_{1}(T+h)\right]-\left[\widehat{\beta}_{0}+\widehat{\beta}_{1}(T+h)\right] \\
& =\varepsilon_{T+h}+\left(\beta_{0}-\widehat{\beta}_{0}\right)+\left(\beta_{1}-\widehat{\beta}_{1}\right)(T+h)
\end{aligned}
\]
假设检验
以一个二次趋势$$y_{t}=\beta_{0}+\beta_{1} t+\beta_{2} t^{2}+\varepsilon_{t}$$为例: