李宏毅_Machine Learning_2019 Task 3
学习打卡内容
大作业
按照 Homework1_Introduction.txt 的要求完成本次作业
作业1:预测PM2.5的值
在这个作业中,我们将用梯度下降法 (Gradient Descent ) 预测 PM2.5 的值 (Regression 回归问题)
Homework1要求:
要求 python3.5+
只能用
请用梯度下降手写线性回归
最好使用 Public Simple Baseline
对于想加载模型而并不想运行整个训练过程的人:
请上传训练代码并命名成 train.py
只要用梯度下降的代码就行了
Homework_best 要求:
要求 python3.5+
任何库都可以用
在 Kaggle 上获得你选择的更高的分
数据介绍: 本次作业使用豐原站的觀測記錄,分成 train set 跟 test set ,train set 是豐原站每個月的前20天所有資料,test set則是從豐原站剩下的資料中取樣出來。 train.csv :每個月前20天每個小時的氣象資料(每小時有18種測資)。共12個月。 test .csv:從剩下的資料當中取樣出連續的10小時為一筆,前九小時的所有觀測數據當作feature,第十小時的PM2.5當作answer。一共取出240筆不重複的 test data,請根據feauure預測這240筆的PM2.5。
请完成之后参考以下资料:
Task 3 Implementation
方案1
'''
利用 Linear Regression 线性回归预测 PM2.5
该方法参考黑桃大哥的优秀作业-|vv|-
'''
import numpy as np
import pandas as pd
from sklearn. preprocessing import StandardScaler
path = "./Dataset/"
train = pd. read_csv( path + 'train.csv' , engine= 'python' , encoding= 'utf-8' )
test = pd. read_csv( path + 'test.csv' , engine= 'python' , encoding= 'gbk' )
train = train[ train[ 'observation' ] == 'PM2.5' ]
test = test[ test[ 'AMB_TEMP' ] == 'PM2.5' ]
train = train. drop( [ 'Date' , 'stations' , 'observation' ] , axis= 1 )
test_x = test. iloc[ : , 2 : ]
train_x = [ ]
train_y = [ ]
for i in range ( 15 ) :
x = train. iloc[ : , i: i + 9 ]
x. columns = np. array( range ( 9 ) )
y = train. iloc[ : , i + 9 ]
y. columns = np. array( range ( 1 ) )
train_x. append( x)
train_y. append( y)
train_x = pd. concat( train_x)
train_y = pd. concat( train_y)
train_y = np. array( train_y, float )
test_x = np. array( test_x, float )
ss = StandardScaler( )
ss. fit( train_x)
train_x = ss. transform( train_x)
ss. fit( test_x)
test_x = ss. transform( test_x)
def r2_score ( y_true, y_predict) :
MSE = np. sum ( ( y_true - y_predict) ** 2 ) / len ( y_true)
return 1 - MSE / np. var( y_true)
class LinearRegression :
def __init__ ( self) :
self. coef_ = None
self. intercept_ = None
self. _theta = None
def fit_normal ( self, X_train, y_train) :
assert X_train. shape[ 0 ] == y_train. shape[ 0 ] , \
"the size of X_train must be equal to the size of y_train"
X_b = np. hstack( [ np. ones( ( len ( X_train) , 1 ) ) , X_train] )
self. _theta = np. linalg. inv( X_b. T. dot( X_b) ) . dot( X_b. T) . dot( y_train)
self. intercept_ = self. _theta[ 0 ]
self. coef_ = self. _theta[ 1 : ]
return self
def fit_gd ( self, X_train, y_train, eta= 0.01 , n_iters= 1e4 ) :
'''
:param X_train: 训练集
:param y_train: label
:param eta: 学习率
:param n_iters: 迭代次数
:return: theta 模型参数
'''
assert X_train. shape[ 0 ] == y_train. shape[ 0 ] , \
"the size of X_train must be equal to the size of y_train"