保存模型的方法
保存模型比较时髦的方式是使用python的一下两个包:
- Pickle (python对象序列化库,自带)
- Joblib (scikit-learn中实现的方法)
本文只讲pickle.
哪些东西可以Pickle化?
能Pickle的:
- 所有数字相关的数据类型,复数也行.
- 布尔型数据.
- Python中的字符,列表,元组和字典.
- 内置函数和类对象.
不能Pickle的:
- 已经pickle过的.
- 类似套接字,文件句柄,数据库连接等.
Pickle例子
在我们pickle机器学习模型之前,先来看看python对象的pickle化:
import pickle
# pickle list object
numbers_list = [1, 2, 3, 4, 5]
list_pickle_path = 'list_pickle.pkl'
# Create an variable to pickle and open it in write mode
list_pickle = open(list_pickle_path, 'wb')
pickle.dump(numbers_list, list_pickle)
list_pickle.close()
通过上面的代码将数字列表pickle到.pkl文件中去,我们可以执行unpickle操作将数字列表还原回来:
# unpickling the list object
# Need to open the pickled list object into read mode
list_pickle_path = 'list_pickle.pkl'
list_unpickle = open(list_pickle_path, 'r')
# load the unpickle object into a variable
numbers_list = pickle.load(list_unpickle)
print "Numbers List :: ", numbers_list
很简单是不是,接着我们来看看机器学习模型的pickle化,其实也很简单,因为在python中万物皆是对象嘛!
构建决策树分类器
先训练一个决策树分类器模型:
import pickle
import pandas as pd
# Scikit-learn method to split the dataset into train and test dataset
from sklearn.cross_validation import train_test_split
# Scikit-learn method to implement the decsion tree classifier
from sklearn.tree import DecisionTreeClassifier
# Load the dataset
balance_scale_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/balance-scale/balance-scale.data', sep=',', header=None)
print "Dataset Length:: ", len(balance_scale_data)
print "Dataset Shape:: ", balance_scale_data.shape
# Split the dataset into train and test dataset
X = balance_scale_data.values[:, 1:5]
Y = balance_scale_data.values[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=100)
# Decision model with Gini index critiria
decision_tree_model = DecisionTreeClassifier(criterion="gini", random_state=100, max_depth=3, min_samples_leaf=5)
decision_tree_model.fit(X_train, y_train)
print "Decision Tree classifier :: ", decision_tree_model
执行代码我们就能得到这样一个模型对象decision_tree_model:
Dataset Length:: 625
Dataset Shape:: (625, 5)
Decision Tree classifier :: DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
max_features=None, max_leaf_nodes=None, min_samples_leaf=5,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=100, splitter='best')
Pickle模型
# Dump the trained decision tree classifier with Pickle
decision_tree_pkl_filename = 'decision_tree_classifier_20170212.pkl'
# Open the file to save as pkl file
decision_tree_model_pkl = open(decision_tree_pkl_filename, 'wb')
pickle.dump(decision_tree_model, decision_tree_model_pkl)
# Close the pickle instances
decision_tree_model_pkl.close()
加载Pickle
# Loading the saved decision tree model pickle
decision_tree_model_pkl = open(decision_tree_pkl_filename, 'rb')
decision_tree_model = pickle.load(decision_tree_model_pkl)
print "Loaded Decision tree model :: ", decision_tree_model
输出:
Loaded Decision tree model :: DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
max_features=None, max_leaf_nodes=None, min_samples_leaf=5,
min_samples_split=2, min_weight_fraction_leaf=0.0,
presort=False, random_state=100, splitter='best')