主题模型的构建与优化
1. 模型评估与最优主题数探索
在主题模型的构建中,我们首先使用了Gensim库中的MALLET模型,并对其进行了评估。以下是计算UMass连贯性得分和模型困惑度的代码:
umass_coherence_model_lda_mallet = gensim.models.CoherenceModel(
model=lda_mallet,
corpus=bow_corpus,
texts=norm_corpus_bigrams,
dictionary=dictionary,
coherence='u_mass'
)
avg_coherence_umass = umass_coherence_model_lda_mallet.get_coherence()
# from STDOUT: <500> LL/token: -8.53533
perplexity = -8.53533
print('Avg. Coherence Score (Cv):', avg_coherence_cv)
print('Avg. Coherence Score (UMass):', avg_coherence_umass)
print('Model Perplexity:', perplexity)
输出结果显示:
Avg. Coherence Score (Cv): 0.5008326905758488
Avg. Coherence Score (UMass): -1
超级会员免费看
订阅专栏 解锁全文
28万+

被折叠的 条评论
为什么被折叠?



