持久化(保存)机器学习模型

本文介绍如何使用Python的Pickle库保存和加载机器学习模型,包括数字列表和决策树分类器的pickle化过程,展示了pickle方法在保存复杂对象如机器学习模型上的应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

保存模型的方法

保存模型比较时髦的方式是使用python的一下两个包:

本文只讲pickle.

哪些东西可以Pickle化?

能Pickle的:

  • 所有数字相关的数据类型,复数也行.
  • 布尔型数据.
  • Python中的字符,列表,元组和字典.
  • 内置函数和类对象.

不能Pickle的:

  • 已经pickle过的.
  • 类似套接字,文件句柄,数据库连接等.

Pickle例子

在我们pickle机器学习模型之前,先来看看python对象的pickle化:

import pickle
# pickle list object
numbers_list = [1, 2, 3, 4, 5]
list_pickle_path = 'list_pickle.pkl'

# Create an variable to pickle and open it in write mode
list_pickle = open(list_pickle_path, 'wb')
pickle.dump(numbers_list, list_pickle)
list_pickle.close()

通过上面的代码将数字列表pickle到.pkl文件中去,我们可以执行unpickle操作将数字列表还原回来:

# unpickling the list object
# Need to open the pickled list object into read mode
list_pickle_path = 'list_pickle.pkl'
list_unpickle = open(list_pickle_path, 'r')


# load the unpickle object into a variable
numbers_list = pickle.load(list_unpickle)
print "Numbers List :: ", numbers_list

很简单是不是,接着我们来看看机器学习模型的pickle化,其实也很简单,因为在python中万物皆是对象嘛!

构建决策树分类器

先训练一个决策树分类器模型:

import pickle
import pandas as pd
# Scikit-learn method to split the dataset into train and test dataset
from sklearn.cross_validation import train_test_split
# Scikit-learn method to implement the decsion tree classifier
from sklearn.tree import DecisionTreeClassifier


# Load the dataset
balance_scale_data = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/balance-scale/balance-scale.data', sep=',', header=None)
print "Dataset Length:: ", len(balance_scale_data)
print "Dataset Shape:: ", balance_scale_data.shape

# Split the dataset into train and test dataset
X = balance_scale_data.values[:, 1:5]
Y = balance_scale_data.values[:, 0]
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.3, random_state=100)

# Decision model with Gini index critiria
decision_tree_model = DecisionTreeClassifier(criterion="gini", random_state=100, max_depth=3, min_samples_leaf=5)
decision_tree_model.fit(X_train, y_train)
print "Decision Tree classifier :: ", decision_tree_model

执行代码我们就能得到这样一个模型对象decision_tree_model:

Dataset Length::  625

Dataset Shape::  (625, 5)

Decision Tree classifier ::  DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,
 max_features=None, max_leaf_nodes=None, min_samples_leaf=5,
 min_samples_split=2, min_weight_fraction_leaf=0.0,
 presort=False, random_state=100, splitter='best')

Pickle模型

# Dump the trained decision tree classifier with Pickle
decision_tree_pkl_filename = 'decision_tree_classifier_20170212.pkl'

# Open the file to save as pkl file
decision_tree_model_pkl = open(decision_tree_pkl_filename, 'wb')
pickle.dump(decision_tree_model, decision_tree_model_pkl)

# Close the pickle instances
decision_tree_model_pkl.close()

加载Pickle

# Loading the saved decision tree model pickle
decision_tree_model_pkl = open(decision_tree_pkl_filename, 'rb')
decision_tree_model = pickle.load(decision_tree_model_pkl)
print "Loaded Decision tree model :: ", decision_tree_model

输出:

Loaded Decision tree model ::  DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=3,

            max_features=None, max_leaf_nodes=None, min_samples_leaf=5,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            presort=False, random_state=100, splitter='best')

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值