案例：决策树decision tree泰坦尼克号幸存者预测

最新推荐文章于 2022-04-08 17:01:38 发布

原创

最新推荐文章于 2022-04-08 17:01:38 发布 · 932 阅读

4 ·

CC 4.0 BY-SA版权

文章标签：

#python #决策树 #机器学习

本文通过Python实现决策树模型，预测泰坦尼克号乘客的生存情况。内容涵盖数据预处理、模型训练、手动调整参数(max_depth, min_impurity_split)以优化模型，并利用GridSearchCV进行自动参数选择，解决过拟合问题，寻找最佳参数组合。" 117752585,11018815,深入理解C语言浮点数内存存储：IEEE754详解,"['C语言', '数据存储', '算法', '程序人生']

案例包括：（1）数据预处理（2）模型训练（3）做优参数组合选择（交叉验证）

1 数据预处理

import pandas as pd
def read_data(path):
    """数据预处理"""
    df=pd.read_csv(path,index_col=0)
    #丢弃无用数据
    df.drop(['Name','Cabin','Ticket'],axis=1,inplace=True)
    #处理性别数据
    df['Sex']=(df['Sex']=='male').astype('int')
    #处理Embarked数据
    labels=df['Embarked'].unique().tolist()
    df=df.replace(to_replace=labels,value=[0,1,2,3])
    #处理缺失数据
    df=df.fillna(0)
    return df
train=read_data('train.csv')
train.head(3)

	Survived	Pclass	Sex	Age	SibSp	Parch	Fare	Embarked
PassengerId
1	0	3	1	22.0	1	0	7.2500	0
2	1	1	0	38.0	1	0	71.2833	1
3	1	3	0	26.0	0	0	7.9250	0

2 模型训练

from sklearn.cross_validation import train_test_split
X=train.iloc[:,1:]
y=train.iloc[:,0]
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.2)
print('train dataset:{0}