原理:
带你理解朴素贝叶斯分类算法
利用朴素贝叶斯模型进行文本分类
from sklearn.naive_bayes import MultinomialNB
# 将文本进行tf-idf
tfv = TfidfVectorizer()
tfv.fit(list(xtrain) + list(xvalid))
xtrain_tfv = tfv.transform(xtrain)
xvalid_tfv = tfv.transform(xvalid)
clf = MultinomialNB()
clf.fit(xtrain_tfv, ytrain)
predictions = clf.predict_proba(xvalid_tfv)
print ("logloss: %0.3f " % multiclass_logloss(yvalid, predictions))
SVM的原理
SVM原理
利用SVM模型进行文本分类
clf = SVC(C=1.0, probability=True)
clf.fit(xtrain_svd_scl, ytrain)
predictions = clf.predict_proba(xvalid_svd_scl)
print ("logloss: %0.3f " % multiclass_logloss(yvalid, predictions))
pLSA、共轭先验分布;LDA主题模型原理:
通俗理解LDA主题模型
from sklearn.decomposition import LatentDirichletAllocation,TruncatedSVD
lda=LatentDirichletAllocation(n_components=15,random_state=42,max_iter=10)
Z=lda.fit_transform(matrixTFIDF)
get_topics(lda.components_,tfidf_v.get_feature_names(),n=15)