一组简单的数据集需要进行线性回归,本来想着很简单的模型,结果报错,第一次竟然还不知道哪里错了:
原代码如下:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score
data = pd.read_excel('./25.2022年华为杯数学建模竞赛/1.xlsx',index_col='year')
X = data.iloc[:,2]
Y = data.iloc[:,5]
shape_y=Y.shape
estimator = LinearRegression()
estimator.fit(X,Y)
print('参数为:',estimator.coef_[0,0])
print('截距为:',estimator.intercept_[0])
plt.scatter(X,Y,c='g')
x = np.linspace(5,24,11)
plt.plot(x,estimator.coef_[0,0]*x+estimator.intercept_[0],c = 'red')
plt.show()
报错信息为:
Traceback (most recent call last):
File "f:/PycharmProject/25.2022年华为杯数学建模竞赛/problem_3.py", line 12, in <module>
estimator.fit(X,Y)
File "F:\anaconda\envs\PyTorch\lib\site-packages\sklearn\linear_model\_base.py", line 663, in fit
X, y, accept_sparse=accept_sparse, y_numeric=True, multi_output=True
File "F:\anaconda\envs\PyTorch\lib\site-packages\sklearn\base.py", line 581, in _validate_data
X, y = check_X_y(X, y, **check_params)
File "F:\anaconda\envs\PyTorch\lib\site-packages\sklearn\utils\validation.py", line 976, in check_X_y
estimator=estimator,
File "F:\anaconda\envs\PyTorch\lib\site-packages\sklearn\utils\validation.py", line 773, in check_array
"if it contains a single sample.".format(array)
ValueError: Expected 2D array, got 1D array instead:
array=[16.17261704 9.96007849 19.29039621 13.43621249 19.1726387 14.24710242
9.43294811 18.97874978 17.18033404 11.97558641 15.9696 ].
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
经过查阅资料我才知道,需要传入的X和y是二维数组,维度为(11,1),原代码传入的数据类似一维数组,维度为(11,),所以导致报错。
这个时候需要对数组进行维度调整:
X = np.array(data.iloc[:,2]).reshape(-1,1)
Y = np.array(data.iloc[:,5]).reshape(-1,1)
这样便满足需求,可以得出结果。
参数为: 0.10890748785759355
截距为: 0.2382050784176286