回归分析——多元先线性回归、一元线性回归分析(基于Python实现)

第 10 章——回归分析

# 10.1 变量间的关系

【例10-1】——上市公司各项指标之间的关系

【代码框10-1】——绘制散点图矩阵

# 图10-1的绘制代码
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

sns.pairplot(df[['每股收益','每股净资产','每股现金流量','总股本']],
             height=1.5,diag_kind='kde', markers='.',kind='reg' )

<seaborn.axisgrid.PairGrid at 0x1a3192da490>
在这里插入图片描述

fig.savefig(‘./图9-2.jpg’, dpi = 200)

注:seaborn可以绘制95%置信区间图

【例10-2】——计算变量的相关系数矩阵并进行显著性检验

【代码框10-2】——相关系数的计算及检验

# 计算相关系数矩阵
import pandas as pd
from scipy.stats import pearsonr
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

corr = df.iloc[:,1:].corr()
corr

#df.iloc[:,1:]   # 选择第1列后的所有行(第1列为编号,不选。逗号前面的:表示行)
每股收益每股净资产每股现金流量总股本
每股收益1.0000000.8862920.5989710.254539
每股净资产0.8862921.0000000.4821340.521195
每股现金流量0.5989710.4821341.0000000.147115
总股本0.2545390.5211950.1471151.000000
# 相关系数的检验
import pandas as pd
from scipy.stats import pearsonr
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

col = ['每股收益', '每股净资产', '每股现金流量', '总股本']
df_pvalue = pd.DataFrame(index=col, columns=col)
for i in range(1, 5):
    for j in range(1, 5):
        cor, p_value = pearsonr(df.iloc[:,i], df.iloc[:,j]) # 计算Pearson相关系数
        df_pvalue.iloc[i-1, j-1] = p_value    # 给出双尾具有P值

df_pvalue
每股收益每股净资产每股现金流量总股本
每股收益0.00.00.0015580.21949
每股净资产0.00.00.014660.007548
每股现金流量0.0015580.014660.00.482835
总股本0.219490.0075480.4828350.0

注:pearsonr(df[‘每股收益’], df[‘每股净资产’]) # 计算两个变量的相关系数

# 10.2 一元线性回归

【例10-3】——每股收益与每股净资产的回归

【代码框10-3】——每股收益与每股净资产:一元线性回归模型的拟合

# 拟合回归模型
import pandas as pd
from statsmodels.formula.api import ols
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model1 = ols("每股收益~每股净资产",data=df).fit()
print(model1.summary())
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   每股收益   R-squared:                       0.786
Model:                            OLS   Adj. R-squared:                  0.776
Method:                 Least Squares   F-statistic:                     84.23
Date:                Thu, 14 Nov 2024   Prob (F-statistic):           3.76e-09
Time:                        16:26:22   Log-Likelihood:                -49.130
No. Observations:                  25   AIC:                             102.3
Df Residuals:                      23   BIC:                             104.7
Df Model:                           1                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.6700      0.698     -2.392      0.025      -3.115      -0.225
每股净资产          0.4675      0.051      9.178      0.000       0.362       0.573
==============================================================================
Omnibus:                        1.134   Durbin-Watson:                   1.993
Prob(Omnibus):                  0.567   Jarque-Bera (JB):                1.082
Skew:                           0.419   Prob(JB):                        0.582
Kurtosis:                       2.420   Cond. No.                         26.7
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
# 输出方差分析表
from statsmodels.stats.anova import anova_lm
anova_lm(model1, typ=1)
dfsum_sqmean_sqFPR(>F)
每股净资产1.0272.995423272.99542384.2331593.760409e-09
Residual23.074.5418413.240950NaNNaN
# 计算残差的标准差
pow(3.240950,1/2)
1.800263869548017

【例10-4】利用回归方程进行预测

【代码框10-4】——回归预测

# 计算点预测值、置信区间和预测区间
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.outliers_influence import summary_table
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model1 = ols("每股收益~每股净资产",data=df).fit()  # 拟合模型

conf_level = 0.95
st, _, _ = summary_table(model1, alpha=1-conf_level)
columns = [x +' ' + y for (x, y) in zip(st.data[0], st.data[1])]
df_res = pd.DataFrame()  # 将SimpleTable转为DataFrame
for i in range(len(st.data) - 2):
   # df_res= df_res.append(pd.DataFrame(st.data[i+2], index=columns).T)#该用法已经不再支持
   df_res = pd.concat([df_res, pd.DataFrame(st.data[i+2], index=columns).T], ignore_index=True)
df_res.reset_index(drop=True, inplace=True)

round(df_res,2).head()   # 显示前5行

# 结果列分别为:序号、销售收入观测值、点预测值、预测标准差、置信下限、置信上限、
# 预测下限、预测上限、预测残差、残差标准差、学生残差、Cook距离
#df_res.drop(columns=['Std Error Mean Predict','Student Residual','Std Error Residual'],inplace=True) # 删除列

#round(df_res,2)
ObsDep Var PopulationPredicted ValueStd Error Mean PredictMean ci 95% lowMean ci 95% uppPredict ci 95% lowPredict ci 95% uppResidualStd Error ResidualStudent ResidualCook's D
01.00.881.070.470.102.04-2.784.92-0.191.74-0.110.00
12.01.143.530.362.794.28-0.277.33-2.391.76-1.360.04
23.04.886.420.465.477.362.5710.26-1.541.74-0.880.03
34.03.232.110.411.272.95-1.715.921.121.750.640.01
45.07.837.670.556.528.813.7711.560.161.710.100.00

注:Python没有计算样本外点的置信区间和预测区间的函数,只能根据样本内的点绘制置信区间和预测区间图。

# 绘制置信区间和预测区间图(图10-5)

import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']

df_res['每股净资产'] = df['每股净资产']
df_plot = df_res.sort_values(by='每股净资产')
df_plot.reset_index(drop=True, inplace=True)

plt.figure(figsize=(7, 4.8))
plt.scatter(df_plot['每股净资产'], df_plot['Dep Var Population'])
p1, = plt.plot(df_plot['每股净资产'], df_plot['Predicted Value'],linewidth=2)
p2, = plt.plot(df_plot['每股净资产'], df_plot['Mean ci 95% low'], 'r:')
p3, = plt.plot(df_plot['每股净资产'], df_plot['Mean ci 95% upp'], 'r:')
p4, = plt.plot(df_plot['每股净资产'], df_plot['Predict ci 95% low'], 'g--')
p5, = plt.plot(df_plot['每股净资产'], df_plot['Predict ci 95% upp'], 'g--')
plt.xlabel('每股净资产',size=12)
plt.ylabel('每股收益',size=12)
plt.legend([p1, p3, p5], ['回归线', '置信区间 ', '预测区间'])

<matplotlib.legend.Legend at 0x1a319956810>

在这里插入图片描述

# 计算x0=10时每股收益的点预测值
model1.predict(exog=dict(每股净资产=10))

0    3.004632
dtype: float64

# 回归模型的诊断

【代码框10-5】——残差和标准化残差

# 输出例10-3的残差和标准化残差
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model1 = ols("每股收益~每股净资产",data=df).fit()
df=pd.DataFrame({"样本编号": df['样本编号'],"每股收益": df['每股收益'],
                 "点预测值":model1.fittedvalues,"残差": model1.resid, 
                 "标准化残差": np.array(model1.resid_pearson)})

round(df,4).head()  # 显示前5行
样本编号每股收益点预测值残差标准化残差
010.881.0693-0.1893-0.1052
121.143.5329-2.3929-1.3292
234.886.4171-1.5371-0.8538
343.232.10711.12290.6237
457.837.66530.16470.0915

【代码框10-6】——例10-3模型的诊断图

# 绘制模型的诊断图(图10-7)
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus']=False
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model1 = ols("每股收益~每股净资产",data=df).fit()   # 拟合模型

# 图(a)残差与拟合值图
plt.subplots(1, 2, figsize=(8, 3.5))
plt.subplot(121)
plt.scatter(model1.fittedvalues, model1.resid)
plt.xlabel('拟合值')
plt.ylabel('残差')
plt.title('(a) 残差与拟合值图', fontsize=12)
plt.axhline(0, ls='--')

# 图(b)正态Q-Q图
ax2 = plt.subplot(122)
pplot = sm.ProbPlot(model1.resid, fit=True)
pplot.qqplot(line='r', ax=ax2, xlabel='理论正态值', ylabel='标准化残差的观测值')
ax2.set_title('(b) 正态Q-Q图', fontsize=12)

plt.tight_layout()
plt.show()

# 注:拟合曲线是手动计算的,需要确定使用多少次项进行拟合。

在这里插入图片描述

# 10.3 多元线性回归

【例10-5】多元线性回归模型的参数估计

【代码框10-7】——多元线性回归分析

# 拟合多元回归模型
from statsmodels.formula.api import ols
import pandas as pd
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model_m = ols("每股收益~每股净资产+每股现金流量+总股本",data=df).fit()
print(model_m.summary())

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                   每股收益   R-squared:                       0.871
Model:                            OLS   Adj. R-squared:                  0.853
Method:                 Least Squares   F-statistic:                     47.41
Date:                Thu, 14 Nov 2024   Prob (F-statistic):           1.58e-09
Time:                        16:26:24   Log-Likelihood:                -42.740
No. Observations:                  25   AIC:                             93.48
Df Residuals:                      21   BIC:                             98.35
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.1167      0.597     -1.870      0.076      -2.359       0.125
每股净资产          0.4903      0.055      8.891      0.000       0.376       0.605
每股现金流量         0.1505      0.072      2.091      0.049       0.001       0.300
总股本           -0.2381      0.086     -2.783      0.011      -0.416      -0.060
==============================================================================
Omnibus:                        0.628   Durbin-Watson:                   1.992
Prob(Omnibus):                  0.730   Jarque-Bera (JB):                0.697
Skew:                           0.213   Prob(JB):                        0.706
Kurtosis:                       2.301   Cond. No.                         32.1
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
# 输出方差分析表
from statsmodels.stats.anova import anova_lm
anova_lm(model_m, typ=1)
dfsum_sqmean_sqFPR(>F)
每股净资产1.0272.995423272.995423128.2290922.105681e-10
每股现金流量1.013.34230713.3423076.2670352.062595e-02
总股本1.016.49124016.4912407.7461251.114149e-02
Residual21.044.7082942.128966NaNNaN
# 计算残差的标准差
pow(2.128966,1/2)
1.459097666367814

【代码框10-8】——计算标准回归系数

# 计算例10-5的标准回归系数
import pandas as pd
from statsmodels.formula.api import ols
from scipy import stats

# Load the data
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

# Drop the '样本编号' column
df.drop(['样本编号'], axis=1, inplace=True)
#Standardize the data
z = stats.zscore(df, ddof=1)
#修改列名
df.columns = ['z每股收益', 'z每股净资产', 'z每股现金流量', 'z总股本']
print(df.head())
# Fit the model
model_m = ols("z每股收益~z每股净资产+z每股现金流量+z总股本",data=df).fit()
print('====================================================================')
# Print the summary
print(model_m.summary())
   z每股收益  z每股净资产  z每股现金流量  z总股本
0   0.88    5.86     1.50  2.28
1   1.14   11.13     4.95  9.09
2   4.88   17.30     1.93  7.37
3   3.23    8.08     1.80  1.45
4   7.83   19.97     4.13  6.32
====================================================================
                            OLS Regression Results                            
==============================================================================
Dep. Variable:                  z每股收益   R-squared:                       0.871
Model:                            OLS   Adj. R-squared:                  0.853
Method:                 Least Squares   F-statistic:                     47.41
Date:                Thu, 14 Nov 2024   Prob (F-statistic):           1.58e-09
Time:                        16:42:30   Log-Likelihood:                -42.740
No. Observations:                  25   AIC:                             93.48
Df Residuals:                      21   BIC:                             98.35
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [0.025      0.975]
------------------------------------------------------------------------------
Intercept     -1.1167      0.597     -1.870      0.076      -2.359       0.125
z每股净资产         0.4903      0.055      8.891      0.000       0.376       0.605
z每股现金流量        0.1505      0.072      2.091      0.049       0.001       0.300
z总股本          -0.2381      0.086     -2.783      0.011      -0.416      -0.060
==============================================================================
Omnibus:                        0.628   Durbin-Watson:                   1.992
Prob(Omnibus):                  0.730   Jarque-Bera (JB):                0.697
Skew:                           0.213   Prob(JB):                        0.706
Kurtosis:                       2.301   Cond. No.                         32.1
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

【代码框10-9】——共线性分析——计算容忍度和方差膨胀因子

import pandas as pd
from statsmodels.formula.api import ols
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model_m = ols("每股收益~每股净资产+每股现金流量+总股本",data=df).fit()

def vif(df_exog, exog_name):
    exog_use = list(df_exog.columns)
    exog_use.remove(exog_name)
    model_m = ols(f"{exog_name}~{'+'.join(list(exog_use))}", data=df_exog).fit()
    rsq = model_m.rsquared
    return 1. / (1. - rsq)

df_vif = pd.DataFrame()
for x in ['每股净资产', '每股现金流量', '总股本']:
    vif_i = vif(df.iloc[:, 2:], x)
    df_vif.loc['VIF', x] = vif_i

df_vif.loc["tolerance"] = 1 / df_vif.loc['VIF']
df_vif
每股净资产每股现金流量总股本
VIF1.7846841.3286411.400132
tolerance0.5603230.7526490.714218

【例10-7】多元线性回归模型预测

【代码框10-10】——多元线性回归预测

# 计算点预测值、置信区间和预测区间
import pandas as pd
from statsmodels.formula.api import ols
from statsmodels.stats.outliers_influence import summary_table
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model_m = ols("每股收益~每股净资产+每股现金流量+总股本",data=df).fit()

conf_level = 0.95
st, _, _ = summary_table(model_m, alpha=1-conf_level)
columns = [x +' ' + y for (x, y) in zip(st.data[0], st.data[1])]
df_res = pd.DataFrame()            # 将SimpleTable转为DataFrame
for i in range(len(st.data) - 2):
    #df_res = df_res.append(pd.DataFrame(st.data[i+2], index=columns).T)
    df_res = pd.concat([df_res, pd.DataFrame(st.data[i+2], index=columns).T], ignore_index=True)
df_res.reset_index(drop=True, inplace=True)

df_res.drop(columns=['Std Error Mean Predict',
                     'Student Residual','Std Error Residual'],inplace=True) # 删除列
round(df_res,2).head() # 显示前5行

ObsDep Var PopulationPredicted ValueMean ci 95% lowMean ci 95% uppPredict ci 95% lowPredict ci 95% uppResidualCook's D
01.00.881.440.592.29-1.714.59-0.560.00
12.01.142.921.993.85-0.256.09-1.780.04
23.04.885.904.976.832.739.08-1.020.01
34.03.232.771.873.67-0.405.940.460.00
45.07.837.796.748.844.5811.000.040.00
# 每股净资产=5、每股现金流量=5、总股本=5时每股收益的点预测值
model_m.predict(exog=dict(每股净资产=5,每股现金流量=5,总股本=5))
0    0.896996
dtype: float64

【代码框10-11】——例10-5的模型诊断

# 绘制残差图诊断模型(图10-8)
import pandas as pd
import numpy as np
from statsmodels.formula.api import ols
import statsmodels.api as sm
import matplotlib.pyplot as plt
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus']=False
df = pd.read_csv('./pydata/example/chap10/example10_1.csv')

model_m = ols("每股收益~每股净资产+每股现金流量+总股本",data=df).fit()
x = model_m.fittedvalues; y = model_m.resid

plt.subplots(1, 2, figsize=(8, 3.5))
plt.subplot(121)
plt.scatter(model_m.fittedvalues, model_m.resid)
plt.xlabel('拟合值')
plt.ylabel('残差')
plt.title('(a) 残差与拟合值图', fontsize=12)
plt.axhline(0, ls='--')

ax2 = plt.subplot(122)
pplot = sm.ProbPlot(model_m.resid, fit=True)
pplot.qqplot(line='r', ax=ax2, xlabel='理论正态值', ylabel='标准化的观测值')
ax2.set_title('(b) 正态Q-Q图', fontsize=12)

plt.tight_layout()
plt.show()

在这里插入图片描述

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

kaka_R-Py

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值