I've been looking forward to this moment for a long time!
1---Which library to call to create?🤔
I will use the scikit-learn library to create my models. When coding, this library is written as sklearn. Scikit-learn is easily the most popular library for modeling the types of data typically stored in DataFrames.
2---What are the steps to build and use a model?😯
- Define: What type of model will it be? A decision tree? Some other type of model? Some other parameters of the model type are specified too.
- Fit: Capture patterns from provided data. This is the heart of modeling.
- Predict: Just what it sounds like
- Evaluate: Determine how accurate the model's predictions are.
Here is an example of defining a decision tree model with scikit-learn and fitting it with the features and target variable💁🏼♀️:
from sklearn.tree import DecisionTreeRegressor
# Define model. Specify a number for random_state to ensure same results ea
#ch run. This is considered a good practice.
#You use any number, and model quality won't depend meaningfully on exact
#ly what value you choose.
melbourne_model = DecisionTreeRegressor(random_state=1)
# Fit model
melbourne_model.fit(X, y)
Out[6]: DecisionTreeRegressor(random_state=1)
3---Let's get started!
Now I have a fitted model that we can use to predict☝️.
In practice, I am interested in predicting the price of new houses coming on the market rather than the houses I already have prices for.
But I still need to predict the first few rows of the training data to see how the predict function works🧐.
#To choose variables/columns,
#we’ll need to look at a list of all the columns in the dataset.
import pandas as pd
melbourne_file_path = '/Users/mac/Desktop/melb_data.csv'
melbourne_data = pd.read_csv(melbourne_file_path)
melbourne_data.columns
#Selecting the prediction target: y = 'Price'
y = melbourne_data.Price
#For now, we'll build a model with only a few features
melbourne_features = ['Rooms', 'Bathroom', 'Landsize',
'Lattitude', 'Longtitude']
X = melbourne_data[melbourne_features]
#Predict house prices using the head method
X.describe()
X.head()
from sklearn.tree import DecisionTreeRegressor
# Define model. Specify a number for random_state to ensure same results each run
melbourne_model = DecisionTreeRegressor(random_state=1)
# Fit model
melbourne_model.fit(X, y)
print("Making predictions for the following 5 houses:")
print(X.head())
print("The predictions are")
print(melbourne_model.predict(X.head()))
Making predictions for the following 5 houses:
Rooms Bathroom Landsize Lattitude Longtitude
0 2 1.0 202.0 -37.7996 144.9984
1 2 1.0 156.0 -37.8079 144.9934
2 3 2.0 134.0 -37.8093 144.9944
3 3 2.0 94.0 -37.7969 144.9969
4 4 1.0 120.0 -37.8072 144.9941
The predictions are
[1480000. 1035000. 1465000. 850000. 1600000.]