以下是一个使用Python和sklearn
库中的TfidfVectorizer
类来实现TF-IDF(Term Frequency-Inverse Document Frequency)的简单示例。TfidfVectorizer
是一个将原始文本转换为TF-IDF特征向量的转换器。
python复制代码
# 导入必要的库 |
|
from sklearn.feature_extraction.text import TfidfVectorizer |
|
from sklearn.datasets import fetch_20newsgroups |
|
# 加载数据集(这里我们使用20 Newsgroups数据集作为示例) |
|
categories = ['alt.atheism', 'talk.religion.misc', 'comp.graphics', 'sci.med'] |
|
newsgroups_train = fetch_20newsgroups(subset='train', categories=categories, shuffle=True, random_state=42) |
|