格兰杰因果分析

最新推荐文章于 2024-09-28 20:56:59 发布

qq_39992456

最新推荐文章于 2024-09-28 20:56:59 发布

阅读量1.4k

点赞数 7

文章标签： python 数据结构

本文链接：https://blog.youkuaiyun.com/qq_39992456/article/details/139094707

版权

定义：如果变量X无助于预测Y，则说明X不是Y的原因；相反，若X是Y的原因，则需要满足2个条件：（1）X应该有助于预测Y，即在Y关于Y的过去值的回归中，添加X的过去值作为独立变量应当显著增加回归的解释能力；（2）Y不应当有助于预测X，因为X有助于预测Y，Y也有助于预测X，则很有可能存在一个或多个其他变量是引起X变化的原因也是引起Y变化的原因。

Granger因果分析适用于平稳性的时间序列数据模型的检验。

对于变量X和Y，granger因果检验要求估计以下回归：

$Y_t=\sum_{i=1}^{m}\alpha_i X_{t-i}+\sum_{i=1}^{m}\beta_i Y_{t-i}+\mu _{1t}\left ( 1 \right )$

$X_t=\sum_{i=1}^{m}\lambda _i Y_{t-i}+\sum_{i=1}^{m}\delta _i X_{t-i}+\mu _{2t}\left ( 2 \right )$

假设白噪声 $\mu _{1t}$ 和 $\mu _{2t}$ 是互不相关的，式（1）假定y取值与y和x过去值有关，式（2）假定x取值与y和x过去值有关。

式（1）的零假设为α1=α2=…=αm=0，式（2）的假设为β1=β2=…=βm=0。

在式（1）中若x的滞后项系数显著不为0，在式（2）中若y的滞后项系数显著为0，就说明：x是y的granger原因。同理，式（1）和式（2）中若x的滞后项系数和y的滞后项系数显著不为0，说明x和y互为granger因果，当它们显著为0时，说明x和y互为独立，彼此不存在grange因果关系。

步骤：

（1）平稳性检验，一般采用单位根检验。

（2）若（1）通过检验后，进行赤池信息准则AIC或贝叶斯信息准则BIC求阶数，granger因果检验。

（3）若（1）未通过，一阶差分或协整性检验，进行赤池信息准则AIC或贝叶斯信息准则BIC求阶数，granger因果检验

def grangercausality2_feature(readpath, savepath):
    """
    该方法接收一个包含2列的2维的数组作为主要参数：
    第一列是当前要预测未来值的序列A，第二列是另一个序列B,该方法就是看B对A的预测是否有帮助。
    该方法的零假设是：B对A没有帮助。
    如果所有检验下的P-Values都小于显著水平0.05，则可以拒绝零假设，并推断出B确实对A的预测有用。
    第二个参数maxlag是设定测试用的lags的最大值。
    """
    filename = os.listdir(readpath)
    f = open(savepath + 'log_GrangerCausalityRelation.txt',
             'a+', encoding='utf-8')
    for name in filename[:1]:
        df = pd.read_csv(readpath + name)
        data1 = df[['NOX', 'erci_wind']]
        data2 = df[['NOX', 'yici_wind']]
        data3 = df[['NOX', 'anshui']]
        data4 = df[['NOX', 'fuhe']]
        data5 = df[['NOX', 'in_Omean']]

        result_1 = grangercausalitytests(
            data1, maxlag=10, addconst=True, verbose=True)
        result_list_1 = []
        for lag in result_1:
            for test in result_1[lag][0]:
                lag_info = {}
                lag_info["lag"] = lag
                lag_info["p_value"] = result_1[lag][0][test][1]
                lag_info["test_type"] = test
                result_list_1.append(lag_info)
        test_result_1 = pd.DataFrame(result_list_1)

        result_2 = grangercausalitytests(
            data2, maxlag=10, addconst=True, verbose=True)
        result_list_2 = []
        for lag in result_2:
            for test in result_2[lag][0]:
                lag_info = {}
                lag_info["lag"] = lag
                lag_info["p_value"] = result_2[lag][0][test][1]
                lag_info["test_type"] = test
                result_list_2.append(lag_info)
        test_result_2 = pd.DataFrame(result_list_2)

        result_3 = grangercausalitytests(
            data3, maxlag=10, addconst=True, verbose=True)
        result_list_3 = []
        for lag in result_3:
            for test in result_3[lag][0]:
                lag_info = {}
                lag_info["lag"] = lag
                lag_info["p_value"] = result_3[lag][0][test][1]
                lag_info["test_type"] = test
                result_list_3.append(lag_info)
        test_result_3 = pd.DataFrame(result_list_3)

        result_4 = grangercausalitytests(
            data4, maxlag=10, addconst=True, verbose=True)
        result_list_4 = []
        for lag in result_4:
            for test in result_4[lag][0]:
                lag_info = {}
                lag_info["lag"] = lag
                lag_info["p_value"] = result_4[lag][0][test][1]
                lag_info["test_type"] = test
                result_list_4.append(lag_info)
        test_result_4 = pd.DataFrame(result_list_4)

        result_5 = grangercausalitytests(
            data5, maxlag=10, addconst=True, verbose=True)
        result_list_5 = []
        for lag in result_5:
            for test in result_5[lag][0]:
                lag_info = {}
                lag_info["lag"] = lag
                lag_info["p_value"] = result_5[lag][0][test][1]
                lag_info["test_type"] = test
                result_list_5.append(lag_info)
        test_result_5 = pd.DataFrame(result_list_5)

        print(result_1, file=f)
        print(result_2, file=f)
        print(result_3, file=f)
        print(result_4, file=f)
        print(result_5, file=f)

        # 制图
        plt.figure(figsize=(25, 20), dpi=200)
        ax1 = plt.subplot(2, 3, 1)
        for t in test_result_1['test_type'].unique():
            t_data = test_result_1[test_result_1['test_type'] == t]
            ax1.plot(t_data['lag'],  # x轴数据
                     t_data['p_value'],  # y轴数据
                     linestyle='-',  # 折线类型
                     linewidth=2,  # 折线宽度
                     color='b',  # 折线颜色
                     marker='o',  # 点的形状
                     markersize=6,  # 点的大小
                     markeredgecolor='r',  # 点的边框色
                     markerfacecolor='r')  # 点的填充色
        ax1.axhline(y=0.05, color='c', linestyle='--')
        ax1.set_title('GrangerCausalityResults1', fontsize=15)
        ax1.set_xlabel('lag', fontsize=15)
        ax1.set_ylabel('p_value', fontsize=15)

        ax2 = plt.subplot(2, 3, 2)
        for t in test_result_2['test_type'].unique():
            t_data = test_result_2[test_result_2['test_type'] == t]
            ax2.plot(t_data['lag'],
                     t_data['p_value'],
                     linestyle='-',
                     linewidth=2,
                     color='b',
                     marker='o',
                     markersize=6,
                     markeredgecolor='r',
                     markerfacecolor='r')
        ax2.axhline(y=0.05, color='c', linestyle='--')
        ax2.set_title('GrangerCausalityResults2', fontsize=15)
        ax2.set_xlabel('lag', fontsize=15)
        ax2.set_ylabel('p_value', fontsize=15)

        ax3 = plt.subplot(2, 3, 3)
        for t in test_result_3['test_type'].unique():
            t_data = test_result_3[test_result_3['test_type'] == t]
            ax3.plot(t_data['lag'],
                     t_data['p_value'],
                     linestyle='-',
                     linewidth=2,
                     color='b',
                     marker='o',
                     markersize=6,
                     markeredgecolor='r',
                     markerfacecolor='r')
        ax3.axhline(y=0.05, color='c', linestyle='--')
        ax3.set_title('GrangerCausalityResults3', fontsize=15)
        ax3.set_xlabel('lag', fontsize=15)
        ax3.set_ylabel('p_value', fontsize=15)

        ax4 = plt.subplot(2, 3, 4)
        for t in test_result_4['test_type'].unique():
            t_data = test_result_4[test_result_4['test_type'] == t]
            ax4.plot(t_data['lag'],
                     t_data['p_value'],
                     linestyle='-',
                     linewidth=2,
                     color='b',
                     marker='o',
                     markersize=6,
                     markeredgecolor='r',
                     markerfacecolor='r')
        ax4.axhline(y=0.05, color='c', linestyle='--')
        ax4.set_title('GrangerCausalityResults4')
        ax4.set_xlabel('lag', fontsize=15)
        ax4.set_ylabel('p_value')

        ax5 = plt.subplot(2, 3, 5)
        for t in test_result_5['test_type'].unique():
            t_data = test_result_5[test_result_5['test_type'] == t]
            ax5.plot(t_data['lag'],  # x轴数据
                     t_data['p_value'],  # y轴数据
                     linestyle='-',  # 折线类型
                     linewidth=2,  # 折线宽度
                     color='b',  # 折线颜色
                     marker='o',  # 点的形状
                     markersize=6,  # 点的大小
                     markeredgecolor='r',  # 点的边框色
                     markerfacecolor='r')  # 点的填充色
        ax5.axhline(y=0.05, color='c', linestyle='--')
        ax5.set_title('GrangerCausalityResults5', fontsize=15)
        ax5.set_xlabel('lag', fontsize=15)
        ax5.set_ylabel('p_value', fontsize=15)

        plt.savefig(save_path + 'grangercausality.png')


if __name__ == '__main__':

    read_path = "path1"
    save_path = "path2"
    grangercausality2_feature(read_path, save_path)

验证结果

grangercausalitytests(df[[‘b’, ‘a’]], maxlag=2)

Granger Causaliy
Number of lags （no zero）1
Ssr based F test：   F=0.5019， p=0.6541， df_denom=3, df_num=1
Ssr based chi2 test：chi2=2.7152, p=0.2583, df=1
Likelihood ratio test: chi2=2.3377, p=0.3327, df=1
Parameter F test: F=0.5019, p=0.6541, df_denom=3, df_num=1

Granger Causaliy
Number of lags （no zero）2
Ssr based F test：   F=0.5019， p=0.6541， df_denom=3, df_num=2
Ssr based chi2 test：chi2=2.7152, p=0.2583, df=2
Likelihood ratio test: chi2=2.3377, p=0.3327, df=2
Parameter F test: F=0.5019, p=0.6541, df_denom=3, df_num=2

说明：

主要看p值，所有p值小于0.05才能证明b对a有效。

Number of lags （no zero）2：当lags=2时的检测结果

Ssr based F test：残差平方和F检验

Ssr based chi2 test：残差平方和卡方检验

Likelihood ratio test：似然估计检验结果

Parameter F test：参数F检验结果