Random Forest

本文介绍了随机森林模型,它通过多棵树的平均预测来提高准确性,优于单个决策树。在处理缺失值并选择相关特征后,使用scikit-learn中的`RandomForestRegressor`进行编码并实现了显著的预测性能提升。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1---Introduction

In the previous chapter we talked about underfitting and overfitting, a phenomenon that still exists in modeling techniques today.🤷🏼
But today I'm going to introduce a clever model: random forest

Random forests use many trees, and it is predicted by the average prediction of each decision tree. There is no doubt that random forests are generally more accurate than single decision tree models. And it works well with other default parameters.👍👍

2---Following Variables👇🏻

  • train_X
  • val_X
  • train_y
  • val_y

3---Coding it

We build a random forest model similarly to how we built a decision tree in scikit-learn - this time using the RandomForestRegressor class instead of DecisionTreeRegressor.

import pandas as pd
    
# Load data
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
# Filter rows with missing values
melbourne_data = melbourne_data.dropna(axis=0)
# Choose target and features
y = melbourne_data.Price
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 'BuildingArea', 
                        'YearBuilt', 'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]

from sklearn.model_selection import train_test_split

# split data into training and validation data, for both features and target
# The split is based on a random number generator. Supplying a numeric value to
# the random_state argument guarantees we get the same split every time we
# run this script.
train_X, val_X, train_y, val_y = train_test_split(X, y,random_state = 0)
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_absolute_error

forest_model = RandomForestRegressor(random_state=1)
forest_model.fit(train_X, train_y)
melb_preds = forest_model.predict(val_X)
print(mean_absolute_error(val_y, melb_preds))
runcell(0, '/Users/mac/Desktop/untitled6.py')
191669.7536453626

4---Congrats👏👏👏

This is a big progress over the $250,000 error we found in "Measure your model validation". What's more, random forests allow parameter tuning, but even if we don't, it works reasonably.

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值