RAGFlow源码部署流程(含问题及解决方案)

📝 前提条件

  • CPU >= 4 核
  • RAM >= 16 GB
  • Disk >= 50 GB
  • python==3.11.7(大于3.8即可,本机环境为3.11.7)
  • openKylin 1.0.2
  • Docker >= 24.0.0 & Docker Compose >= v2.26.1

    如果你并没有在本机安装 Docker(Windows、Mac,或者 Linux), 可以参考文档 Install Docker Engine 自行安装。

🔨 以源代码启动服务

源神!启动!!!

1、安装 Poetry。如已经安装,可跳过本步骤:

pipx install poetry # 如果未安装pipx,需要先使用pip install pipx安装
pipx inject poetry poetry-plugin-pypi-mirror #插件安装

#设置环境变量以及国内镜像源
export POETRY_VIRTUALENVS_CREATE=true POETRY_VIRTUALENVS_IN_PROJECT=true
export POETRY_PYPI_MIRROR_URL=https://pypi.tuna.tsinghua.edu.cn/simple/

2、下载源代码并安装 Python 依赖:

使用git下载也可,本机没有安装配置git,下载源码包,解压到指定目录

cd ragflow/ # 进入项目根目录
~/.local/bin/poetry install --sync --no-root # install RAGFlow dependent python modules

执行依赖安装报错:于缺少 C++ 编译器(g++

Installing pyicu (2.14): Failed

ChefBuildError

Backend subprocess exited when trying to invoke build_wheel

<string>:42: DeprecationWarning: Use shutil.which instead of find_executable
<string>:42: DeprecationWarning: Use shutil.which instead of find_executable
/tmp/tmpn73_dt82/.venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'test_suite'
warnings.warn(msg)
/tmp/tmpn73_dt82/.venv/lib/python3.11/site-packages/setuptools/_distutils/dist.py:261: UserWarning: Unknown distribution option: 'tests_require'
warnings.warn(msg)
(running 'icu-config --version')

Building PyICU 2.14 for ICU 73.1 (max ICU major version supported: 76)

(running 'icu-config --cxxflags --cppflags')
Adding CFLAGS="-I/home/dgis/anaconda3/include" from /home/dgis/anaconda3/bin/icu-config
(running 'icu-config --ldflags')
Adding LFLAGS="-L/home/dgis/anaconda3/lib -licui18n -licuuc -licudata" from /home/dgis/anaconda3/bin/icu-config
running bdist_wheel
running build
running build_py
creating build/lib.linux-x86_64-cpython-311/icu
copying py/icu/init.py -> build/lib.linux-x86_64-cpython-311/icu
running build_ext
building 'icu.icu' extension
creating build/temp.linux-x86_64-cpython-311
g++ -pthread -B /home/dgis/anaconda3/compiler_compat -DNDEBUG -fwrapv -O2 -Wall -fPIC -O2 -isystem /home/dgis/anaconda3/include -fPIC -O2 -isystem /home/dgis/anaconda3/include -fPIC -I/tmp/tmpn73_dt82/.venv/include -I/home/dgis/anaconda3/include/python3.11 -c icu.cpp -o build/temp.linux-x86_64-cpython-311/icu.o -std=c++17 -I/home/dgis/anaconda3/include -DPYICU_VER="2.14" -DPYICU_ICU_MAX_VER="76"
error: command 'g++' failed: No such file or directory

at ~/.local/share/pipx/venvs/poetry/lib/python3.11/site-packages/poetry/installation/chef.py:164 in _prepare
160│
161│ error = ChefBuildError("\n\n".join(message_parts))
162│
163│ if error is not None:
→ 164│ raise error from None
165│
166│ return path
167│
168│ def _prepare_sdist(self, archive: Path, destination: Path | None = None) -> Path:

Note: This error originates from the build backend, and is likely not a problem with poetry but with pyicu (2.14) not supporting PEP 517 builds. You can verify this by running 'pip wheel --no-cache-dir --use-pep517 "pyicu (==2.14)"'.

解决方案:打开终端并运行以下命令,确保安装完成后重新执行

sudo apt update
sudo apt install build-essential


~/.local/bin/poetry install --sync --no-root

3、通过 Docker Compose 启动依赖的服务(MinIO, Elasticsearch, Redis, and MySQL):

#官方命令
docker compose -f docker/docker-compose-base.yml up -d

#这里本机装的docker-compose的执行命令是要带‘-’的,所以跟github上可能不一样,具体根据实际情况使用
docker-compose -f docker/docker-compose-base.yml up -d

如果未安装docker:可自行安装,本机使用麒麟系统,安装docker遇到很多问题,下面主要介绍麒麟安装docker的正确步骤(其他系统类同):

1. 查看系统版本
[root@localhost opt]# cat /etc/os-release

#输出

NAME="Kylin"
VERSION="银河麒麟桌面操作系统V10 (SP1)"
VERSION_US="Kylin Linux Desktop V10 (SP1)"
ID=kylin
ID_LIKE=debian
PRETTY_NAME="Kylin V10 SP1"
VERSION_ID="v10"
HOME_URL="http://www.kylinos.cn/"
SUPPORT_URL="http://www.kylinos.cn/support/technology.html"
BUG_REPORT_URL="http://www.kylinos.cn/"
PRIVACY_POLICY_URL="http://www.kylinos.cn"
VERSION_CODENAME=kylin
UBUNTU_CODENAME=kylin
PROJECT_CODENAME=V10SP1
KYLIN_RELEASE_ID="2303"

[root@localhost opt]# cat /etc/kylin-build
#输出结果
Kylin-Desktop V10-SP1
Build 20230427
buildid: 41998

[root@localhost opt]# nkvers

这个命令会显示麒麟系统的构建版本信息:使用的是麒麟(Kylin)桌面版系统,具体版本为Kylin-Desktop V10-SP1-hwe,构建日期为2021年8月20日。

2、查看 Linux 内核版本(3.10以上)
[root@localhost opt]# uname -r
5.4.18-85-generic
[root@localhost opt]# uname -a
Linux it0-pc 5.4.18-85-generic #74-KYLINOS SMP Fri Mar 24 11:20:42 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
3. 查看 iptabls 版本(1.4以上)
[root@localhost opt]# iptables --version
iptables v1.8.4 (legacy)
4. 判断处理器架构
[root@localhost opt]# uname -p
  x86_64   # 也可能 是 aarch64

系统处理器架构为 [ARM 架构](https://so.youkuaiyun.com/so/search?q=ARM 架构&spm=1001.2101.3001.7020);如果为 x86 架构的,则会显示 x86_64;

5. 离线下载 Docker 安装包

https://download.docker.com/linux/static/stable/

点进去,选择想要安装的版本, 下载了版本:docker-27.3.1.tgz

下载完成后,上传至服务器 /opt 目录下,然后解压:

tar -zxvf docker-27.3.1.tgz
6. 移动解压出来的二进制文件到 /usr/bin 目录中
sudo mv docker/* /usr/bin/

然后就可以测试下Docker

[root@localhost opt]# sudo docker -v

#输出
Docker version 27.3.1, build ce12230

[root@localhost opt]# sudo docker version

#输出
Client:
 Version:           27.3.1
 API version:       1.41 (downgraded from 1.47)
 Go version:        go1.22.7
 Git commit:        ce12230
 Built:             Fri Sep 20 11:39:44 2024
 OS/Arch:           linux/amd64
 Context:           default

Server:
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.8
  Git commit:       20.10.7-0kylin5~20.04.2
  Built:            Tue Nov  9 01:25:31 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.5.9-0kylin1~20.04.6
  GitCommit:        
 runc:
  Version:          1.1.4-0kylin1~20.04.3
  GitCommit:


此时Docker 还没启动,只是可以看到Docker 的版本信息了。

测试 Docker 启动:

[root@localhost opt]# dockerd

可以看到 docker 可以正常启动,不过当前是在窗口中手动启动的,Ctrl + C,杀掉就好。

7. 配置 Docker 服务
7.1 编辑 docker 的系统服务文件
vi /usr/lib/systemd/system/docker.service

为docker.service添加可读写权限

sudo chmod 777 /usr/lib/systemd/system/docker.service
7.2 将下面的内容复制到刚创建的docker.service文件中
[Unit]
Description=Docker Application Container Engine
Documentation=https://docs.docker.com
After=network-online.target firewalld.service
Wants=network-online.target
[Service]
Type=notify
ExecStart=/usr/bin/dockerd
ExecReload=/bin/kill -s HUP $MAINPID
LimitNOFILE=infinity
LimitNPROC=infinity
TimeoutStartSec=0
Delegate=yes
KillMode=process
Restart=on-failure
StartLimitBurst=3
StartLimitInterval=60s
[Install]
WantedBy=multi-user.target

7.3 为docker.service添加执行权限
sudo chmod +x /usr/lib/systemd/system/docker.service
7.4 编辑daemon.json
vi /etc/docker/daemon.json

#并添加以下内容:

{
  "registry-mirrors": ["https://registry.docker-cn.com"],
  "exec-opts": ["native.cgroupdriver=systemd"]
}

保存后,执行:

 sudo systemctl daemon-reload
7.5 启动 Docker
[root@localhost opt]# sudo systemctl start docker
7.6 添加开机自动启动
[root@localhost opt]# sudo systemctl enable docker

Docker 安装 参考: https://blog.youkuaiyun.com/qq_30665009/article/details/125938033

8.安装 docker-compose

直接 GitHub 下载:docker/compose 选择自己喜欢的版本,这里直接拉满,下载当前最新的版本:v2.24.0

这里根据自己的架构,选择对应的包,下载就好了。

cp docker-compose-linux-aarch64  /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
 

查看版本

[root@localhost opt]# docker-compose -v
Docker Compose version v2.24.0
 



作者:zhanglb12
链接:https://www.jianshu.com/p/fd7210b3c8e4
来源:简书
著作权归作者所有。

拉取镜像发现es拉取很慢。

解决方案:手动拉取镜像后,重新执行命令

sudo docker pull elasticsearch:8.11.3

4、在 /etc/hosts 中添加以下代码,将 conf/service_conf.yaml 文件中的所有 host 地址都解析为 127.0.0.1

127.0.0.1       es01 infinity mysql minio redis

在文件 docker/service_conf.yaml 中,对照 docker/.env 的配置将 mysql 端口更新为 5455,es 端口更新为 1200

5、如果无法访问 HuggingFace,可以把环境变量 HF_ENDPOINT 设成相应的镜像站点:

#注:临时方案,当终端关闭,这条环境变量消失,需要重新设置
export HF_ENDPOINT=https://hf-mirror.com

6、启动后端服务:

source .venv/bin/activate
export PYTHONPATH=$(pwd)
bash docker/launch_backend_service.sh

执行bash docker/launch_backend_service.sh报错:导入 api 模块时未能找到该模块

Starting task_executor.py for task 0 (Attempt 1)
Starting ragflow_server.py (Attempt 1)
Traceback (most recent call last):
File "rag/svr/task_executor.py", line 24, in <module>
from api.utils.log_utils import initRootLogger
ModuleNotFoundError: No module named 'api'
Traceback (most recent call last):
File "api/ragflow_server.py", line 23, in <module>
from api.utils.log_utils import initRootLogger
ModuleNotFoundError: No module named 'api'

解决方案:

1、在 Python 环境中测试导入

在你的 Python 虚拟环境中,尝试手动导入 api 模块,看看是否能成功:

python3

然后在 Python 交互式命令行中运行:

import api.ragflow_server

报错根本原因是缺少 transformers 模块:

(ragflow-py3.11) (base) dgis@think:~/project/ragflow$ python3
Python 3.11.7 (main, Dec 15 2023, 18:12:31) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

import api.ragflow_server
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/dgis/project/ragflow/api/ragflow_server.py", line 33, in <module>
from api.apps import app
File "/home/dgis/project/ragflow/api/apps/init.py", line 136, in <module>
client_urls_prefix = [
^
File "/home/dgis/project/ragflow/api/apps/init.py", line 137, in <listcomp>
register_page(path) for dir in pages_dir for path in search_pages_path(dir)
^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/api/apps/init.py", line 120, in register_page
spec.loader.exec_module(page)
File "/home/dgis/project/ragflow/api/apps/file_app.py", line 24, in <module>
from api.db.services.document_service import DocumentService
File "/home/dgis/project/ragflow/api/db/services/document_service.py", line 31, in <module>
from graphrag.mind_map_extractor import MindMapExtractor
File "/home/dgis/project/ragflow/graphrag/mind_map_extractor.py", line 28, in <module>
from rag.llm.chat_model import Base as CompletionLLM
File "/home/dgis/project/ragflow/rag/llm/init.py", line 85, in <module>
from .cv_model import (
File "/home/dgis/project/ragflow/rag/llm/cv_model.py", line 28, in <module>
from transformers import GenerationConfig
ModuleNotFoundError: No module named 'transformers'

解决方案:

手动安装依赖
尝试手动下载 transformers 的源代码包并安装。可以在 PyPI 上找到相应的文件,然后使用以下命令安装:

pip install /path/to/downloaded/transformers-4.38.1-py3-none-any.whl -i https://pypi.tuna.tsinghua.edu.cn/simple

解决完上一下,继续下一个:环境中没有找到 PyTorch、TensorFlow 或 Flax 这三种深度学习框架中的任何一个。

import api.ragflow_server
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.

解决方案:

pip install torch torchvision torchaudio -i https://pypi.tuna.tsinghua.edu.cn/simple --extra-index-url https://download.pytorch.org/whl/cu113

到这,后端服务就已经能正常启动了,但是在前端使用过程中,遇到 Python 无法找到名为 FlagEmbedding 的模块。

": "naive", "parser_config": {"auto_keywords": 0, "auto_questions": 0, "raptor": {"use_raptor": false}, "chunk_token_num": 128, "delimiter": "\n!?;\u3002\uff1b\uff01\uff1f", "layout_recognize": true, "html4excel": false}, "name": "\u4e00\u6c7d\u5927\u4f17 ID6 CROZZ\u4f7f\u7528\u8bf4\u660e\u4e66.pdf", "type": "pdf", "location": "\u4e00\u6c7d\u5927\u4f17 ID6 CROZZ\u4f7f\u7528\u8bf4\u660e\u4e66.pdf", "size": 5129608, "tenant_id": "82013314b6ce11efb9e5f46b8c8bef44", "language": "Chinese", "embd_id": "BAAI/bge-large-zh-v1.5@BAAI", "pagerank": 0, "img2txt_id": "qwen-vl-max@Tongyi-Qianwen", "asr_id": "paraformer-realtime-8k-v1@Tongyi-Qianwen", "llm_id": "qwen-plus@Tongyi-Qianwen", "update_time": 1733882818905}
Traceback (most recent call last):
File "/home/dgis/project/ragflow/rag/svr/task_executor.py", line 464, in handle_task
do_handle_task(task)
File "/home/dgis/project/ragflow/rag/svr/task_executor.py", line 382, in do_handle_task
embedding_model = LLMBundle(task_tenant_id, LLMType.EMBEDDING, llm_name=task_embedding_id, lang=task_language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/api/db/services/llm_service.py", line 226, in init
self.mdl = TenantLLMService.model_instance(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/peewee.py", line 3128, in inner
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/api/db/services/llm_service.py", line 129, in model_instance
return EmbeddingModel[model_config["llm_factory"]](
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/rag/llm/embedding_model.py", line 65, in init
from FlagEmbedding import FlagModel
ModuleNotFoundError: No module named 'FlagEmbedding'

解决方案:

pip install flagembedding==1.2.10 -i https://pypi.tuna.tsinghua.edu.cn/simple

在系统前端页面解析文档,继续报错:

Traceback (most recent call last):
File "/home/dgis/project/ragflow/rag/svr/task_executor.py", line 464, in handle_task
do_handle_task(task)
File "/home/dgis/project/ragflow/rag/svr/task_executor.py", line 405, in do_handle_task
chunks = build_chunks(task, progress_callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/rag/svr/task_executor.py", line 189, in build_chunks
cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"],
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/rag/app/naive.py", line 202, in chunk
"title_tks": rag_tokenizer.tokenize(re.sub(r".[a-zA-Z]+", "", filename)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dgis/project/ragflow/rag/nlp/rag_tokenizer.py", line 321, in tokenize res = " ".join(self.english_normalize_(res)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/dgis/project/ragflow/rag/nlp/rag_tokenizer.py", line 251, in english_normalize_ return [self.stemmer.stem(self.lemmatizer.lemmatize(t)) if re.match(r"[a-zA-Z_-]+", t) else t for t in tks]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/rag/nlp/rag_tokenizer.py", line 251, in <listcomp>
return [self.stemmer.stem(self.lemmatizer.lemmatize(t)) if re.match(r"[a-zA-Z_-]+$", t) else t for t in tks]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 85, in lemmatize
lemmas = self._morphy(word, pos)
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/stem/wordnet.py", line 41, in _morphy
return wn._morphy(form, pos, check_exceptions)
^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/corpus/util.py", line 120, in getattr
self.__load()
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/corpus/util.py", line 86, in __load
raise e
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/corpus/util.py", line 81, in __load
root = nltk.data.find(f"{self.subdir}/{self.__name}")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:

Resource wordnet not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('wordnet')

For more information see: https://www.nltk.org/data.html

Attempted to load corpora/wordnet

Searched in:
- '/home/dgis/nltk_data'
- '/home/dgis/project/ragflow/.venv/nltk_data'
- '/home/dgis/project/ragflow/.venv/share/nltk_data'
- '/home/dgis/project/ragflow/.venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

解决方案:

  1. 打开 Python 解释器
    在终端中输入 python 或 python3 进入 Python 解释器。

    python
    
  2. 下载 WordNet
    在 Python 解释器中输入以下代码:

    import nltk
    nltk.download('wordnet')
    

    这将启动 NLTK 的下载器,并下载 wordnet 资源。

  3. 发现这种做法太慢了,直接手动下载,放到指定位置解压!!!

你可以直接从命令行使用 wget 或 curl 下载 wordnet 数据集。以下是一个示例:

wget https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/corpora/wordnet.zip
下载后,解压并将其放入 NLTK 数据目录中:

unzip wordnet.zip -d /path/to/your/nltk_data/corpora/

解决完上一个,继续解析文档,又又又报错辣:

File "/home/dgis/project/ragflow/rag/app/naive.py", line 229, in chunk
sections, tables = pdf_parser(filename if not binary else binary, from_page=from_page, to_page=to_page, callback=callback)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/rag/app/naive.py", line 150, in call
tbls = self._extract_table_figure(True, zoomin, True, True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/deepdoc/parser/pdf_parser.py", line 810, in _extract_table_figure
self.tbl_det.construct_table(bxs, html=return_html, is_english=self.is_english)))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/deepdoc/vision/table_structure_recognizer.py", line 148, in construct_table
b["btype"] = TableStructureRecognizer.blockType(b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/deepdoc/vision/table_structure_recognizer.py", line 120, in blockType
tks = [t for t in rag_tokenizer.tokenize(b["text"]).split() if len(t) > 1]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/rag/nlp/rag_tokenizer.py", line 258, in tokenize
return " ".join([self.stemmer.stem(self.lemmatizer.lemmatize(t)) for t in word_tokenize(line)])
^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/tokenize/init.py", line 142, in word_tokenize
sentences = [text] if preserve_line else sent_tokenize(text, language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/tokenize/init.py", line 119, in sent_tokenize
tokenizer = _get_punkt_tokenizer(language)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/tokenize/init.py", line 105, in _get_punkt_tokenizer
return PunktTokenizer(language)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1744, in init
self.load_lang(lang)
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/dgis/project/ragflow/.venv/lib/python3.11/site-packages/nltk/data.py", line 579, in find
raise LookupError(resource_not_found)
LookupError:

Resource punkt_tab not found.
Please use the NLTK Downloader to obtain the resource:

import nltk
nltk.download('punkt_tab')

For more information see: https://www.nltk.org/data.html

Attempted to load tokenizers/punkt_tab/english/

Searched in:
- '/home/dgis/nltk_data'
- '/home/dgis/project/ragflow/.venv/nltk_data'
- '/home/dgis/project/ragflow/.venv/share/nltk_data'
- '/home/dgis/project/ragflow/.venv/lib/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'

原因一样的,让我们继续来一遍:

直接手动下载,放到指定位置解压!!!

你可以直接从命令行使用 wget 或 curl 下载 wordnet 数据集。以下是一个示例:

wget https://raw.githubusercontent.com/nltk/nltk_data/ghpages/packages/tokenizers/punkt_tab.zip
下载后,解压并将其放入 NLTK 数据目录中:

unzip punkt_tab.zip -d /path/to/your/nltk_data/tokenizers/

到这儿,后端已经完成部署了,目前使用上也没出现什么bug,完结撒花!!!

---------------------------------------------2025.02.20追更--------------------------------------------------------

在RAG使用过程中,针对PPT类型的文档解析时,惊现系统崩溃错误!!!

分析原因应该是ICU库没有正确被找到,

Process terminated. Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support.
   at System.Environment.FailFast(System.String)
   at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode()
   at System.Globalization.GlobalizationMode..cctor()
   at System.Globalization.CultureData.CreateCultureWithInvariantData()
   at System.Globalization.CultureData.get_Invariant()
   at System.Globalization.CultureInfo..cctor()
   at System.Globalization.CultureInfo.GetCultureInfoHelper(Int32, System.String, System.String)
   at System.Globalization.CultureInfo.GetCultureInfo(System.String)
   at System.Reflection.RuntimeAssembly.GetLocale()
   at System.Reflection.RuntimeAssembly.GetName(Boolean)
   at System.Reflection.Assembly.GetName()
   at .(System.Reflection.Assembly)
   at .(Int32, Boolean)
   at .(Int32)
   at ..cctor()
   at Aspose.Slides.Presentation..cctor()
   at Aspose.Slides.Presentation..ctor(System.IO.Stream)
   at WrpNs_Aspose.WrpNs_Slides.WrpCs_Presenta_041D11ED.ctor_002_Presentation(Aspose.WrpGen.Interop.VariantArg*)
docker/launch_backend_service.sh:行 69: 2575034 已放弃               $PY api/ragflow_server.py

解决方法:

在linux直接运行此命令:注:这是单次设置,重启服务器后会失效。

export DOTNET_SYSTEM_GLOBALIZATION_INVARIANT=1

原以为已经顺利解决,但是祸不单行。重启解析又报错了,但是系统还没有崩溃,请看vcr:

ERROR:root:Chunking 10005ea31e5341f48cb3e40a6d29773c.pptx/xxxxx.pptx got exception
Traceback (most recent call last):
  File "/home/dgis/project/ragflow/rag/svr/task_executor.py", line 189, in build_chunks
    cks = chunker.chunk(task["name"], binary=binary, from_page=task["from_page"],
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dgis/project/ragflow/rag/app/presentation.py", line 107, in chunk
    for pn, (txt, img) in enumerate(ppt_parser(
                                    ^^^^^^^^^^^
  File "/home/dgis/project/ragflow/rag/app/presentation.py", line 34, in __call__
    with slides.Presentation(BytesIO(fnm)) as presentation:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Proxy error(PptxReadException): The type initializer for 'Gdip' threw an exception. ---> TypeInitializationException: The type initializer for 'Gdip' threw an exception. ---> DllNotFoundException: Unable to load shared library 'libgdiplus' or one of its dependencies. In order to help diagnose loading problems, consider setting the LD_DEBUG environment variable: liblibgdiplus: cannot open shared object file: No such file or directory

经了解,原因是在尝试处理 PPTX 文件时,程序无法加载 libgdiplus 共享库。这通常与图形处理或文档处理相关,特别是在使用 Mono 或某些依赖于 libgdiplus 的库时。

那么已知问题原因,直接上大招:

  1. 确认安装 libgdiplus
    首先,确保 libgdiplus 已正确安装。您可以使用以下命令在Linux系统上安装它:

    sudo apt-get update
    sudo apt-get install -y libgdiplus
    

    在其他 Linux 发行版上,使用相应的包管理器进行安装。

  2. 检查依赖项
    libgdiplus 可能依赖于其他库,使用以下命令检查其依赖项:

    ldd /usr/lib/libgdiplus.so
    

    确保所有列出的库都已安装,并且没有显示 "not found"。

源神重启!!!嗯.....问题解决啦。开导开导!

7、安装前端依赖:

cd web
npm install --force

又又又报错辣:没装npm和node环境。。。

来吧,解决方案:

安装 Node.js 和 npm

如果未安装 Node.js 和 npm,您可以通过以下步骤进行安装:

使用包管理器安装(适用于 Ubuntu/Debian)

sudo apt update
sudo apt install nodejs npm

使用 Node Version Manager (NVM) 安装

NVM 是一个用于管理 Node.js 版本的工具,推荐使用它:

  1. 安装 NVM:

    curl -o- https://raw.githubusercontent.com/nvm-sh/nvm/v0.39.4/install.sh | bash
    

    安装完成后,您需要重启终端或运行以下命令以使 NVM 生效:

    export NVM_DIR="$HOME/.nvm"
    [ -s "$NVM_DIR/nvm.sh" ] && \. "$NVM_DIR/nvm.sh"  # This loads nvm
    
  2. 使用 NVM 安装 Node.js:

    nvm install node
    

    或者安装特定版本,例如:

    nvm install 14  # 安装 Node.js 14.x 版本
    
  3. 安装完成后,您可以再次检查版本:

    node -v
    npm -v

8、启动前端服务:

npm run dev 

这下真的完结了,撒花!!

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值