GraphRAG入门指南：从零开始构建智能检索系统

CarlowZJ

于 2025-06-09 23:32:36 发布

阅读量669

点赞数 14

CC 4.0 BY-SA版权

文章标签： GraphRAG

本文链接：https://blog.youkuaiyun.com/csdn122345/article/details/148511437

摘要

本文详细介绍GraphRAG系统的入门使用方法，包括环境配置、系统安装、数据索引和查询应用等核心功能。通过一个完整的端到端示例，帮助开发者快速上手GraphRAG，构建基于知识图谱的智能检索系统。文章将结合实际案例，详细讲解系统配置、数据处理、查询应用等关键环节，为开发者提供完整的实践指南。

1. 系统概述

1.1 系统架构

1.2 功能特点

在这里插入图片描述

mindmap
  root((GraphRAG))
    核心功能
      文本索引
      知识图谱
      向量检索
    应用场景
      文档检索
      知识问答
      关系分析
    技术特点
      LLM集成
      图数据库
      向量存储

2. 环境准备

2.1 系统要求

Python 3.10-3.12
操作系统：Windows/Linux/MacOS
网络环境：可访问OpenAI或Azure OpenAI服务

2.2 安装方式

3. 系统安装

3.1 基础安装

# 安装GraphRAG
def install_graphrag():
    """
    安装GraphRAG系统
    """
    import subprocess
    
    try:
        # 使用pip安装
        subprocess.run(["pip", "install", "graphrag"], check=True)
        print("GraphRAG安装成功！")
    except subprocess.CalledProcessError as e:
        print(f"安装失败: {str(e)}")
        raise

3.2 环境验证

# 验证安装
def verify_installation():
    """
    验证GraphRAG安装
    """
    import graphrag
    
    try:
        # 检查版本
        version = graphrag.__version__
        print(f"GraphRAG版本: {version}")
        
        # 检查配置
        config = graphrag.get_config()
        print("配置检查通过")
        
    except Exception as e:
        print(f"验证失败: {str(e)}")
        raise

4. 数据准备

4.1 数据目录结构

4.2 数据准备脚本

# 数据准备
def prepare_data():
    """
    准备示例数据
    """
    import os
    import requests
    
    def create_directories():
        """
        创建目录结构
        """
        # 创建输入目录
        os.makedirs("./ragtest/input", exist_ok=True)
        print("目录创建成功")
    
    def download_sample_data():
        """
        下载示例数据
        """
        # 下载示例文本
        url = "https://www.gutenberg.org/cache/epub/24022/pg24022.txt"
        response = requests.get(url)
        
        # 保存文件
        with open("./ragtest/input/book.txt", "w", encoding="utf-8") as f:
            f.write(response.text)
        print("示例数据下载完成")

5. 系统配置

5.1 配置结构

5.2 配置示例

# 系统配置
def configure_system():
    """
    配置GraphRAG系统
    """
    import os
    import yaml
    
    def init_workspace():
        """
        初始化工作空间
        """
        # 创建配置文件
        config = {
            "models": {
                "chat": {
                    "type": "openai_chat",
                    "api_key": "${GRAPHRAG_API_KEY}"
                },
                "embedding": {
                    "type": "openai_embedding",
                    "api_key": "${GRAPHRAG_API_KEY}"
                }
            }
        }
        
        # 保存配置
        with open("./ragtest/settings.yaml", "w") as f:
            yaml.dump(config, f)
        
        # 创建环境变量文件
        with open("./ragtest/.env", "w") as f:
            f.write("GRAPHRAG_API_KEY=your_api_key_here")
        
        print("系统配置完成")

6. 索引构建

6.1 索引流程

6.2 索引实现

# 索引构建
def build_index():
    """
    构建数据索引
    """
    import subprocess
    
    def run_indexer():
        """
        运行索引器
        """
        try:
            # 执行索引命令
            subprocess.run([
                "graphrag", "index",
                "--root", "./ragtest"
            ], check=True)
            print("索引构建完成")
        except subprocess.CalledProcessError as e:
            print(f"索引构建失败: {str(e)}")
            raise

7. 查询应用

7.1 查询类型

在这里插入图片描述

7.2 查询实现

# 查询应用
def query_system():
    """
    系统查询
    """
    import subprocess
    
    def global_search():
        """
        全局搜索
        """
        try:
            # 执行全局搜索
            subprocess.run([
                "graphrag", "query",
                "--root", "./ragtest",
                "--method", "global",
                "--query", "What are the top themes in this story?"
            ], check=True)
        except subprocess.CalledProcessError as e:
            print(f"查询失败: {str(e)}")
            raise
    
    def local_search():
        """
        局部搜索
        """
        try:
            # 执行局部搜索
            subprocess.run([
                "graphrag", "query",
                "--root", "./ragtest",
                "--method", "local",
                "--query", "Who is Scrooge and what are his main relationships?"
            ], check=True)
        except subprocess.CalledProcessError as e:
            print(f"查询失败: {str(e)}")
            raise