【Kaggle笔记】Bike Sharing Demand

本文介绍了一个自行车租赁需求预测的比赛项目,使用随机森林回归算法对自行车租赁需求进行了预测,并详细展示了从数据读取到特征工程再到模型训练及结果输出的全过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

比赛题目


代码

# -*- coding:utf-8 -*-

import pandas as pd

# 读入原始数据
train_df = pd.read_csv("train.csv", header=0)
test_df = pd.read_csv("test.csv", header=0)


# 选取特征值
selected_features = ['datetime', 'season', 'holiday', 'workingday', 'weather', 'temp', 'atemp', 'humidity', 'windspeed']

X_train_df = train_df[selected_features]
y_train_df = train_df['count']

X_test_df = test_df[selected_features]


# 特征值处理
X_train_df['month'] = pd.DatetimeIndex(X_train_df.datetime).month
X_train_df['day'] = pd.DatetimeIndex(X_train_df.datetime).dayofweek
X_train_df['hour'] = pd.DatetimeIndex(X_train_df.datetime).hour
X_train_df = X_train_df.drop(['datetime'], axis=1)  # axis=1意思为每行执行,axis=0表示每列执行

X_test_df['month'] = pd.DatetimeIndex(X_test_df.datetime).month
X_test_df['day'] = pd.DatetimeIndex(X_test_df.datetime).dayofweek
X_test_df['hour'] = pd.DatetimeIndex(X_test_df.datetime).hour
X_test_df = X_test_df.drop(['datetime'], axis=1)


# 采用DictVectorizer进行特征向量化
from sklearn.feature_extraction import DictVectorizer
dict_vec = DictVectorizer(sparse=False)

X_train_df = dict_vec.fit_transform(X_train_df.to_dict(orient='record'))
X_test_df = dict_vec.transform(X_test_df.to_dict(orient='record'))


# 使用RandomForestRegressor进行回归预测
# from sklearn.ensemble import GradientBoostingRegressor
from sklearn.ensemble import RandomForestRegressor
gbr = RandomForestRegressor()
gbr.fit(X_train_df, y_train_df)
gbr_y_predict = gbr.predict(X_test_df)


# 输出结果
gbr_submission = pd.DataFrame({'datetime': test_df['datetime'], 'count': gbr_y_predict})
gbr_submission.to_csv('gbr_submission.csv', index=False)
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值