对于具有空间差异的数据,如果不知道数据的特征关系或意义,直接用杜宾模型来处理是一个比较通用的思路,只是后续还需要很多检验去证明结果的可解释性和统计性。
但如果我们已经知道特征的意义,比如企业经济发展的数据中有着员工的科研能力,公司文化,当下的政策改革,外界的经济变化,我们就可以将其分为个体效应(不随时间改变的特征)和时间效应(所有个体共同经历的时间趋势),从而能够快速直接地分析出各个地域企业的发展状况。
以下是一个例子:
# 加载必要的包
library(plm)
library(lmtest)
library(dplyr)
# 生成模拟数据集
set.seed(123)
n <- 100 # 个体数量
t <- 5 # 时间周期
# 创建面板数据结构
data <- expand.grid(id = 1:n, time = 1:t) %>%
mutate(
# 个体固定效应(不随时间变化)
alpha_i = rnorm(n, mean = 0, sd = 2)[id],
# 时间固定效应(不随个体变化)
gamma_t = rnorm(t, mean = 0, sd = 1)[time],
# 解释变量
X = rnorm(n*t, mean = 5, sd = 2),
# 误差项
epsilon = rnorm(n*t, mean = 0, sd = 1),
# 生成因变量(真实系数β=0.8)
Y = 0.8 * X + alpha_i + gamma_t + epsilon
)
# 查看前几行数据
head(data)
# 双重固定效应模型估计
twoway_model <- plm(Y ~ X,
data = data,
index = c("id", "time"),
model = "within",
effect = "twoways")
# 混合模型(无固定效应)
pooled_model <- plm(Y ~ X,
data = data,
index = c("id", "time"),
model = "pooling")
# 个体固定效应模型
individual_model <- plm(Y ~ X,
data = data,
index = c("id", "time"),
model = "within",
effect = "individual")
# 时间固定效应模型
time_model <- plm(Y ~ X,
data = data,
index = c("id", "time"),
model = "within",
effect = "time")
# 查看双重固定效应模型结果
summary(twoway_model)
# 正确进行F检验的方法
# 1. 检验双重固定效应是否优于混合模型
pFtest(twoway_model, pooled_model)
# 2. 检验个体固定效应是否显著
pFtest(individual_model, pooled_model)
# 3. 检验时间固定效应是否显著
pFtest(time_model, pooled_model)
# 4. 检验双重固定效应是否优于仅个体固定效应
pFtest(twoway_model, individual_model)
# 5. 检验双重固定效应是否优于仅时间固定效应
pFtest(twoway_model, time_model)
输出:
Twoways effects Within Model
Call:
plm(formula = Y ~ X, data = data, effect = "twoways", model = "within",
index = c("id", "time"))
Balanced Panel: n = 100, T = 5, N = 500
Residuals:
Min. 1st Qu. Median 3rd Qu. Max.
-3.224723 -0.583125 -0.010202 0.599678 2.960869
Coefficients:
Estimate Std. Error t-value Pr(>|t|)
X 0.778466 0.026107 29.818 < 2.2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Total Sum of Squares: 1329.9
Residual Sum of Squares: 409.08
R-Squared: 0.6924
Adj. R-Squared: 0.61141
F-statistic: 889.127 on 1 and 395 DF, p-value: < 2.22e-16
F test for twoways effects
data: Y ~ X
F = 17.185, df1 = 103, df2 = 395, p-value < 2.2e-16
alternative hypothesis: significant effects
F test for individual effects
data: Y ~ X
F = 13.23, df1 = 99, df2 = 399, p-value < 2.2e-16
alternative hypothesis: significant effects
F test for time effects
data: Y ~ X
F = 6.673, df1 = 4, df2 = 494, p-value = 3.094e-05
alternative hypothesis: significant effects
F test for twoways effects
data: Y ~ X
F = 27.637, df1 = 4, df2 = 395, p-value < 2.2e-16
alternative hypothesis: significant effects
F test for twoways effects
data: Y ~ X
F = 16.759, df1 = 99, df2 = 395, p-value < 2.2e-16
alternative hypothesis: significant effects
输出表明:模型需要固定效应加入到模型中,且个体效应非常显著,只是需要控制个别特殊异体,时间效应同理;所有的F的p值都小于0.001,说明必须同时控制时间和个体固定效应,结果中X的系数为0.778,表明是纯净的因果效应,而标准差0.026则说明模型的精度较高。