Building Your Model-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_59907082/article/details/136594340

I've been looking forward to this moment for a long time!

1---Which library to call to create?🤔

I will use the scikit-learn library to create my models. When coding, this library is written as sklearn. Scikit-learn is easily the most popular library for modeling the types of data typically stored in DataFrames.

2---What are the steps to build and use a model?😯

Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.
Fit: Capture patterns from provided data. This is the heart of modeling.
Predict: Just what it sounds like
Evaluate: Determine how accurate the model's predictions are.

Here is an example of defining a decision tree model with scikit-learn and fitting it with the features and target variable💁🏼‍♀️:

from sklearn.tree import DecisionTreeRegressor

# Define model. Specify a number for random_state to ensure same results ea
#ch run. This is considered a good practice. 
#You use any number, and model quality won't depend meaningfully on exact
#ly what value you choose.
melbourne_model = DecisionTreeRegressor(random_state=1)

# Fit model
melbourne_model.fit(X, y)

Out[6]: DecisionTreeRegressor(random_state=1)

3---Let's get started!

Now I have a fitted model that we can use to predict☝️.

In practice, I am interested in predicting the price of new houses coming on the market rather than the houses I already have prices for.

But I still need to predict the first few rows of the training data to see how the predict function works🧐.

#To choose variables/columns,
#we’ll need to look at a list of all the columns in the dataset.
import pandas as pd
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path) 
melbourne_data.columns
#Selecting the prediction target: y = 'Price'
y = melbourne_data.Price
#For now, we'll build a model with only a few features
melbourne_features = ['Rooms', 'Bathroom', 'Landsize', 
                      'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
#Predict house prices using the head method
X.describe()
X.head()
from sklearn.tree import DecisionTreeRegressor

# Define model. Specify a number for random_state to ensure same results each run
melbourne_model = DecisionTreeRegressor(random_state=1)

# Fit model
melbourne_model.fit(X, y)
print("Making predictions for the following 5 houses:")
print(X.head())
print("The predictions are")
print(melbourne_model.predict(X.head()))

Making predictions for the following 5 houses:
   Rooms  Bathroom  Landsize  Lattitude  Longtitude
0      2       1.0     202.0   -37.7996    144.9984
1      2       1.0     156.0   -37.8079    144.9934
2      3       2.0     134.0   -37.8093    144.9944
3      3       2.0      94.0   -37.7969    144.9969
4      4       1.0     120.0   -37.8072    144.9941
The predictions are
[1480000. 1035000. 1465000.  850000. 1600000.]