15、文本分类：从传统方法到深度学习及模型解释

pytorchlight8

于 2025-08-11 12:37:51 发布

阅读量36

点赞数

CC 4.0 BY-SA版权

分类专栏： NLP实战指南：从理论到应用文章标签：文本分类 Doc2Vec 深度学习

本文链接：https://blog.youkuaiyun.com/pytorchlight8/article/details/151093498

NLP实战指南：从理论到应用专栏收录该内容

37 篇文章 ¥499.90

订阅专栏¥69.90

会员秒杀 ¥9.9 重磅福利

超级会员免费看

文本分类：从传统方法到深度学习及模型解释

1. 传统模型的特征提取与问题

在文本分类任务中，由于超参数选择存在一定随机性，每次提取的推断向量都会有所不同。为了获得稳定的表示，我们需要多次运行（即步骤）并聚合这些向量。以下是使用训练好的 Doc2Vec 模型推断特征并训练逻辑回归分类器的代码：

#Infer the feature representation for training and test data using 
#the trained model
model = Doc2Vec.load("d2v.model")
#Infer in multiple steps to get a stable representation
train_vectors =  [model.infer_vector(list_of_tokens, steps=50) 
              for list_of_tokens in train_data]
test_vectors = [model.infer_vector(list_of_tokens, steps=50) 
              for list_of_tokens in test_data]
myclass = LogisticRegression(class_weight="balanced")  
#because classes are not balanced
myclass.fit(train_vectors, train_cats)
preds = myclass.predict(test_vectors)
print(classificatio

会员秒杀 ¥9.9 重磅福利

超级会员免费看