Python中使用NLTK库解决错误:LookupError: from nltk.book import

本文介绍了词干提取的概念及其在信息检索中的应用,并详细记录了使用pip安装NLTK库的过程及解决安装中出现的问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

什么是词干提取?
在语言形态学和信息检索里,词干提取是去除词缀得到词根的过程─—得到单词最一般的写法。对于一个词的形态词根,词干并不需要完全相同;相关的词映射到同一个词干一般能得到满意的结果,即使该词干不是词的有效根。从1968年开始在计算机科学领域出现了词干提取的相应算法。很多搜索引擎在处理词汇时,对同义词采用相同的词干作为查询拓展,该过程叫做归并。

使用pip安装NLTK

sudo pip install nltk
>>> import nltk
#没有报错即安装成功

使用过程中,报错;错误代码为:
Traceback (most recent call last):
File “E:\python\lib\site-packages\nltk\corpus\util.py”, line 80, in __load
try: root = nltk.data.find(‘{}/{}’.format(self.subdir, zip_name))
File “E:\python\lib\site-packages\nltk\data.py”, line 648, in find
raise LookupError(resource_not_found)
LookupError:


Resource ‘corpora/brown.zip/brown/’ not found. Please use the
NLTK Downloader to obtain the resource: >>> nltk.download()

查过资料后知道,有人遇到过此类问题,并给出了解决办法:
老外给的解决方案

>>> nltk.download()

然后就跳出了所谓的下载界面:
这里写图片描述

接下来就可以了。。。。

ragflow在鲲鹏服务器上启动出现[root@k8s01 docker]# docker logs -f ragflow-server Starting nginx... Starting ragflow_server... Starting 1 task executor(s) on host '61e1fda06dea'... 2025-07-30 07:27:35,391 INFO 21 ragflow_server log path: /ragflow/logs/ragflow_server.log, log levels: {'peewee': 'WARNING', 'pdfminer': 'WARNING', 'root': 'INFO'} 2025-07-30 07:28:07,339 INFO 21 found 0 gpus 2025-07-30 07:28:10,138 INFO 21 [HUQIE]:Trie file /ragflow/rag/res/huqie.txt.trie not found, build the default trie file 2025-07-30 07:28:10,139 INFO 21 [HUQIE]:Build trie from /ragflow/rag/res/huqie.txt 2025-07-30 07:28:36,500 INFO 21 [HUQIE]:Build trie cache to /ragflow/rag/res/huqie.txt.trie 2025-07-30 07:28:42,050 INFO 21 init database on cluster mode successfully 2025-07-30 07:28:47,514 INFO 21 load_model /ragflow/rag/res/deepdoc/det.onnx uses CPU 2025-07-30 07:28:47,647 INFO 21 load_model /ragflow/rag/res/deepdoc/rec.onnx uses CPU Traceback (most recent call last): File "/ragflow/api/ragflow_server.py", line 36, in <module> from api.apps import app File "/ragflow/api/apps/__init__.py", line 137, in <module> client_urls_prefix = [ File "/ragflow/api/apps/__init__.py", line 138, in <listcomp> register_page(path) for dir in pages_dir for path in search_pages_path(dir) File "/ragflow/api/apps/__init__.py", line 120, in register_page spec.loader.exec_module(page) File "/ragflow/api/apps/api_app.py", line 28, in <module> from api.db.services.dialog_service import DialogService, chat File "/ragflow/api/db/services/dialog_service.py", line 36, in <module> from rag.app.resume import forbidden_select_fields4resume File "/ragflow/rag/app/resume.py", line 27, in <module> from deepdoc.parser.resume import step_one, step_two File "/ragflow/deepdoc/parser/resume/step_two.py", line 26, in <module> from deepdoc.parser.resume.entities import degrees, schools, corporations File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <module> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <listcomp> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 68, in corpNorm tks = rag_tokenizer.tokenize(nm).split() File "/ragflow/rag/nlp/rag_tokenizer.py", line 331, in tokenize res.extend([self.stemmer.stem(self.lemmatizer.lemmatize(t)) for t in word_tokenize(L)]) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize tokenizer = _get_punkt_tokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer return PunktTokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__ self.load_lang(lang) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang lang_dir = find(f"tokenizers/punkt_tab/{lang}/") File "/ragflow/.venv/lib/python3.10/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/ragflow/.venv/nltk_data' - '/ragflow/.venv/share/nltk_data' - '/ragflow/.venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' ********************************************************************** Traceback (most recent call last): File "/ragflow/rag/svr/task_executor.py", line 57, in <module> from rag.app import laws, paper, presentation, manual, qa, table, book, resume, picture, naive, one, audio, \ File "/ragflow/rag/app/resume.py", line 27, in <module> from deepdoc.parser.resume import step_one, step_two File "/ragflow/deepdoc/parser/resume/step_two.py", line 26, in <module> from deepdoc.parser.resume.entities import degrees, schools, corporations File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <module> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 93, in <listcomp> GOOD_CORP = set([corpNorm(rmNoise(c), False) for c in GOOD_CORP]) File "/ragflow/deepdoc/parser/resume/entities/corporations.py", line 68, in corpNorm tks = rag_tokenizer.tokenize(nm).split() File "/ragflow/rag/nlp/rag_tokenizer.py", line 331, in tokenize res.extend([self.stemmer.stem(self.lemmatizer.lemmatize(t)) for t in word_tokenize(L)]) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize sentences = [text] if preserve_line else sent_tokenize(text, language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize tokenizer = _get_punkt_tokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer return PunktTokenizer(language) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__ self.load_lang(lang) File "/ragflow/.venv/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang lang_dir = find(f"tokenizers/punkt_tab/{lang}/") File "/ragflow/.venv/lib/python3.10/site-packages/nltk/data.py", line 579, in find raise LookupError(resource_not_found) LookupError: ********************************************************************** Resource punkt_tab not found. Please use the NLTK Downloader to obtain the resource: >>> import nltk >>> nltk.download('punkt_tab') For more information see: https://www.nltk.org/data.html Attempted to load tokenizers/punkt_tab/english/ Searched in: - '/root/nltk_data' - '/ragflow/.venv/nltk_data' - '/ragflow/.venv/share/nltk_data' - '/ragflow/.venv/lib/nltk_data' - '/usr/share/nltk_data' - '/usr/local/share/nltk_data' - '/usr/lib/nltk_data' - '/usr/local/lib/nltk_data' **********************************************************************
最新发布
07-31
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

逸尘️

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值