FASTopic 开源项目教程

纪亚钧

于 2025-05-25 09:00:50 发布

阅读量328

点赞数 3

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/gitblog_00003/article/details/148200643

FASTopic 开源项目教程

FASTopic A Fast, Adaptive, Stable, and Transferable Topic Model 项目地址: https://gitcode.com/gh_mirrors/fa/FASTopic

1. 项目介绍

FASTopic 是一个快速、自适应、稳定且可迁移的主题模型。它不同于传统的 LDA、基于 VAE 的 ProdLDA 和 ETM，或是基于聚类的 Top2Vec 和 BERTopic 方法。FASTopic 利用预训练的 Transformer 模型中文档、主题和单词嵌入之间的最优传输来建模主题和文档的主题分布。

2. 项目快速启动

安装

使用 pip 安装 FASTopic：

pip install fastopic

或者从源代码安装：

pip install git+https://github.com/bobxwu/FASTopic.git

快速开始

以下是一个使用 FASTopic 从 20newsgroups 数据集发现主题的快速示例：

from fastopic import FASTopic
from topmost import Preprocess
from sklearn.datasets import fetch_20newsgroups

# 获取数据集
docs = fetch_20newsgroups(subset='all', remove=('headers', 'footers', 'quotes'))['data']

# 数据预处理
preprocess = Preprocess(vocab_size=10000)

# 创建模型实例并拟合数据
model = FASTopic(50, preprocess)
top_words, doc_topic_dist = model.fit_transform(docs)

# 输出发现的主题的_top_words_
print(top_words)