为了这个寒假没那么颓废,也为考研的面试积累一点资本,所以在kaggle上面参加一下比赛来提高自己的水平。
kaggle的注册一直验证不了,后来用yahoo的邮箱就ok了。
第一个项目是一个练习项目:Digit Recognizer。主要是数字识别。我使用了scikit-learn,所以程序就很简单。
NN的代码如下
from sklearn.neighbors.nearest_centroid import NearestCentroid
import numpy as np
import pandas as pd
import csv
DRread = pd.read_csv('train.csv')
DRtest = pd.read_csv('test.csv')
X = DRread.iloc[:, 1:]
Y = DRread.iloc[:, 0]
clf = NearestCentroid()
clf.fit(X, Y)
NearestCentroid(metric='euclidean', shrink_threshold=None)
result = clf.predict(DRtest)
csvWrite = file('DR_DNNresult.csv','w')
writer = csv.writer(csvWrite)
writer.writerow(result)
csvWrite.close()
随即森林的代码如下
import pandas as pd
import csv
from sklearn.ensemble import RandomForestClassifier
DRtrain = pd.read_csv('train.csv')
DRtest = pd.read_csv('test.csv')
X = DRtrain.iloc[:, 1:]
Y = DRtrain.iloc[:, 0]
clf = RandomForestClassifier(n_estimators=10, max_depth=None, min_samples_split=1, random_state=0)
clf.fit(X, Y)
result = clf.predict(DRtest)
csvWrite = file('DR_RFresult.csv', 'w')
writer = csv.writer(csvWrite, dialect='excel')
writer.writerow(result)
csvWrite.close()