python pickle,更新Python Pickle对象

本文介绍如何在Python中使用pickle模块保存和更新机器学习分类器。通过加载已有的pickle文件,并用新数据集继续训练,最终将更新后的分类器重新保存。文章提供了完整的代码示例,包括如何设置特征集、加载或创建分类器以及保存分类器。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I am doing a project in Machine Learning and for that I am using the pickle module of Python.

Basically, I am parsing through a huge data set which is not possible in one execution that is why I need to save the classifier object and update it in the next execution.

So my question is, when I run the program again with the new data set then will the already created pickle object be modified (or updated). If not then how can I update the same pickle object every time I run the program.

save_classifier = open("naivebayes.pickle","wb")

pickle.dump(classifier,save_classifier)

save_classifier.close()

解决方案

Unpickling your classifier object will re-create it in the same state that it was when you pickled it, so you can proceed to update it with fresh data from your data set. And at the end of the program run, you pickle the classifier again and save it to a file again. It's a Good Idea to not overwrite the same file but to keep a backup (or even better, a series of backups), in case you mess something up. That way, you can easily go back to a known good state of your classifier.

You should experiment with pickling, using a simple program and a simple object to pickle and unpickle, until you're totally confident with how this all works.

Here's a rough sketch of how to update the pickled classifier data.

import pickle

import os

from os.path import exists

# other imports required for nltk ...

picklename = "naivebayes.pickle"

# stuff to set up featuresets ...

featuresets = [(find_features(rev), category) for (rev, category) in documents]

numtrain = int(len(documents) * 90 / 100)

training_set = featuresets[:numtrain]

testing_set = featuresets[numtrain:]

# Load or create a classifier and apply training set to it

if exists(picklename):

# Update existing classifier

with open(picklename, "rb") as f:

classifier = pickle.load(f)

classifier.train(training_set)

else:

# Create a brand new classifier

classifier = nltk.NaiveBayesClassifier.train(training_set)

# Create backup

if exists(picklename):

backupname = picklename + '.bak'

if exists(backupname):

os.remove(backupname)

os.rename(picklename, backupname)

# Save

with open(picklename, "wb") as f:

pickle.dump(classifier, f)

The first time you run this program it will create a new classifier, train it with the data in training_set, then pickle classifier to "naivebayes.pickle". Each subsequent time you run this program it will load the old classifier and apply more training data to it.

BTW, if you are doing this in Python 2 you should use the much faster cPickle module; you can do that by replacing

import pickle

with

import cPickle as pickle

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值