这两个模型都是针对线性回归模型linear_model,区别在于使用了不同的损失函数或者不同的正则项函数
相关指数R2知识介绍
回归平方和+残差平方和=总偏差平方和
残差平方和=sum(y预测i-y观测i)^2
总偏差平方和=sum(y观测i-y观测平均)^2
回归平方和=sum(y预测i-y观测平均)^2
R2=1-残差平方和/总偏差平方和
import numpy as np # 数组库
import matplotlib.pyplot as plt # 作图库
from sklearn.metrics import r2_score # 使用R2相关指数作为模型指标 ,metrics:指标库
# 构造数据集
np.random .seed (42 ) # 随机种子
n_samples, n_features = 50 , 200 # 样本数,特征数母
X = np.random .randn (n_samples, n_features)# 构造训练特征矩阵
coef = 3 * np.random .randn (n_features) # 构造权重数组大小
inds = np.arange (n_features) # 构造顺序权重值
np.random .shuffle (inds) # 随机话顺序权重值
coef[inds[10 :]] = 0 # sparsify coef 权重赋值
y = np.dot (X , coef) # 构造训练的目标数据集
# add noise 加入噪声
y += 0.01 * np.random .normal ((n_samples,)) # 构造有噪声的目标数据集
# Split data in train set and test set # 分割出训练和测试集
n_samples = X .shape [0 ]
X_train, y_train = X [:n_samples / 2 ], y [:n_samples / 2 ] # 训练集x,y
X_test, y_test = X [n_samples / 2 :], y [n_samples / 2 :] # 测试集x,y
Lasso:平方损失+L1范数
from sklearn.linear_model import Lasso
alpha = 0.1
lasso = Lasso(alpha=alpha)
y_pred_lasso = lasso.fit(X_train, y_train).predict(X_test)
r2_score_lasso = r2_score(y_test, y_pred_lasso)
print(lasso)
print("r^2 on test data : %f" % r2_score_lasso)
Out:
Lasso(alpha=0.1 , copy_X=True , fit_intercept=True , max_iter=1000 ,
normalize=False , positive=False , precompute=False , random_state=None ,selection='cyclic' , tol=0.0001 , warm_start=False )
r^2 on test data : 0.384710
ElasticNet:平方损失+L1和L2范数的混合使用
from sklearn.linear_model import ElasticNet
enet = ElasticNet(alpha=alpha, l1_ratio=0.7 )
y_pred_enet = enet.fit(X_train, y_train).predict(X_test)
r2_score_enet = r2_score(y_test, y_pred_enet)
print(enet)
print("r^2 on test data : %f" % r2_score_enet)
Out:
ElasticNet(alpha=0.1 , copy_X=True , fit_intercept=True , l1_ratio=0.7 ,
max_iter=1000 , normalize=False , positive=False , precompute=False ,
random_state=None , selection='cyclic' , tol=0.0001 , warm_start=False )
r^2 on test data : 0.24017
Lasso和ElasticNet的在相同训练集下,训练出来的权重个数及大小,并与原有构造数据做数据可视化
plt.plot(lasso.coef_,color ='lightgreen' ,linewidth=2 ,label='Lasso coefficients' )# lasso每个权重的值为纵坐标,下标为横坐标(i,wi)i=1...200(200个特征)
plt.plot(enet.coef_,color ='gold' ,linewidth=2 ,label='Elastic net coefficients' )# ElasticNet每个权重的值为纵坐标,下标为横坐标(i,wi)i=1...200(200个特征)
plt.plot(coef,'--' ,color ='navy' ,linewidth=2 ,label='Original coefficients' )# 原始的每个权重的值为纵坐标,下标为横坐标(i,wi)i=1...200(200个特征)
plt.xlabel('the index of coefficient' )# 横轴为权重的位置索引
plt.ylabel('the values of the index coefficient' )# 纵轴为对应索引的权重的值
plt.legend(loc='best' ) #图例
plt.title("Lasso R^2: %f, Elastic Net R^2: %f"
% (r2_score_lasso, r2_score_enet))
plt.show()
注:个人笔记