基于 RAG 和大模型的工程实战：原生大模型应用回复质量和初步输入增强后大模型回复质量的比较研究-优快云博客

本文链接：https://blog.youkuaiyun.com/tyler_download/article/details/143745312

我们仍然使用上一节中的notebook，确保已安装所需的包和API密钥配置。在上一节中，我们使用了Langchain的提示组件帮助创建了一个提示（prompt），这次我们尝试自己动手来完成。添加以下代码以手动生成提示：

from openai import OpenAI
import time
client = OpenAI()
gptmodel = "gpt-4o"

def generate_answer(prompt):
  print(f"generate answer prompt: {prompt}")
  try:
    start = time.time()
    response = client.chat.completions.create(
        model = gptmodel,
        messages = [
            {'role':"system", "content": "You are an expert of llm and RAG"},
            {'role':"assistant", "content": "You can explain read the input and answer in detail"},
            {'role': 'user', 'content': prompt}
        ],
        temperature = 0.1, #让模型有一定的想象自由度
    )
    #这是StrOutputParser的任务
    return response.choices[0].message.content.strip()
  except Exception as e:
    return str(e)

def call_llm_with_self_prompot(questions):
  #text_input = '\n'.join(questions)
  #我们通过Langchain hub来完成
  prompt = f"Please elaborate on the following content:\n {questions}"
  print(f"prompt is : {prompt}")
  return generate_answer(prompt)

import textwrap
#这是StrOutputParser的任务
def print_formatted_response(response):
  wrapper = textwrap.TextWrapper(width = 80) #80字每行
  wrapped_text = wrapper.fill(text=response)
  print("Response:")
  print("----------------")
  print(wrapped_text)
  print("----------------")

运行上述代码后，可以按以下方式调用：

query = "tell me something about rag store"
llm_response = call_llm_with_self_prompot(query)
response = print_formatted_response(llm_response)
print(response)

然后我们会得到以下答案：

Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach
in the field of artificial intelligence, particularly within the realm of
natural language processing (NLP). It innovatively combines the capabilities of
neural network-based language models with retrieval systems to enhance the
generation of text, making it more accurate, informative, and contextually
relevant. This methodology leverages the strengths of both generative and
retrieval architectures to tackle complex tasks that require not only linguistic
fluency but also factual correctness and depth of knowledge. At the core of
Retrieval Augmented Generation (RAG) is a generative model, typically a
transformer-based neural network, similar to those used in models like GPT
(Generative Pre-trained Transformer) or BERT (Bidirectional Encoder
Representations from Transformers). This component is responsible for producing
coherent and contextually appropriate language outputs based on a mixture of
input prompts and additional information fetched by the retrieval component.
Complementing the language model is the retrieval system, which is usually built
on a database of documents or a corpus of texts. This system uses techniques
from information retrieval to find and fetch documents that are relevant to the
input query or prompt. The mechanism of relevance determination can range from
simple keyword matching to more complex semantic search algorithms which
interpret the meaning behind the query to find the best matches. This component
merges the outputs from the language model and the retrieval system. It
effectively synthesizes the raw data fetched by the retrieval system into the
generative process of the language model. The integrator ensures that the
information from the retrieval system is seamlessly incorporated into the final
text output, enhancing the model's ability to generate responses that are not
only fluent and grammatically correct but also rich in factual details and
context-specific nuances. When a query or prompt is received, the system first
processes it to understand the requirement or the context. Based on the
processed query, the retrieval system searches through its database to find
relevant documents or information snippets. This retrieval is guided by the
similarity of content in the documents to the query, which can be determined
through various techniques like vector embeddings or semantic similarity
measures. The retrieved documents are then fed into the language model. In some
implementations, this integration happens at the token level, where the model
can access and incorporate specific pieces of information from the retrieved
texts dynamically as it generates each part of the response. The language model,
now augmented with direct access to retrieved information, generates a response.
This response is not only influenced by the training of the model but also by
the specific facts and details contained in the retrieved documents, making it
more tailored and accurate. By directly incorporating information from external
sources, Retrieval Augmented Generation (RAG) models can produce responses that
are more factual and relevant to the given query. This is particularly useful in
domains like medical advice, technical support, and other areas where precision
and up-to-date knowledge are crucial. Retrieval Augmented Generation (RAG)
systems can dynamically adapt to new information since they retrieve data in
real-time from their databases. This allows them to remain current with the
latest knowledge and trends without needing frequent retraining. With access to
a wide range of documents, Retrieval Augmented Generation (RAG) systems can
provide detailed and nuanced answers that a standalone language model might not
be capable of generating based solely on its pre-trained knowledge. While
Retrieval Augmented Generation (RAG) offers substantial benefits, it also comes
with its challenges. These include the complexity of integrating retrieval and
generation systems, the computational overhead associated with real-time data
retrieval, and the need for maintaining a large, up-to-date, and high-quality
database of retrievable texts. Furthermore, ensuring the relevance and accuracy
of the retrieved information remains a significant challenge, as does managing
the potential for introducing biases or errors from the external sources. In
summary, Retrieval Augmented Generation represents a significant advancement in
the field of artificial intelligence, merging the best of retrieval-based and
generative technologies to create systems that not only understand and generate
natural language but also deeply comprehend and utilize the vast amounts of
information available in textual form. A RAG vector store is a database or
dataset that contains vectorized data points.

正如我们所见，输出的文本信息量很大，但其内容并不是我们所需要的。它提供了大量信息，但与我们查询相关的内容几乎没有。这就是我们下一步需要改进的地方。我们需要为查询提供上下文，这样上下文可以引导LLM生成方向性的响应，而不是让LLM尝试各种可能的方向，最终生成低质量的响应。关键在于如何为特定查询提供上下文。

在上一节中我们已经看到，上下文可能需要从网页爬取、从数据库加载、从PDF文档中提取等。数据准备或领域知识构建是一个非常复杂且工程性很强的过程，我们将在专门的一章中详细介绍这一方面。但是现在假设我们已经有了领域知识的准备，例如以下RAG信息:

#webbaseloader => 信息抓取 => 问题背景或者特定知识库
#假设数据已经准备好了
#这里应该由WebBaseLoader或PDFLoader处理
#查看问题和段落拥有共同词汇的数量, bank, river bank, central bank
query = 'tell me something about rag store'
db_records = [
    "Retrieval Augmented Generation (RAG) represents a sophisticated hybrid approach in the field of artificial intelligence, particularly within the realm of natural language processing (NLP).",
    "It innovatively combines the capabilities of neural network-based language models with retrieval systems to enhance the generation of text, making it more accurate, informative, and contextually relevant.",
    "This methodology leverages the strengths of both generative and retrieval architectures to tackle complex tasks that require not only linguistic fluency but also factual correctness and depth of knowledge.",
    "At the core of Retrieval Augmented Generation (RAG) is a generative model, typically a transformer-based neural network, similar to those used in models like GPT (Generative Pre-trained Transformer) or BERT (Bidirectional Encoder Representations from Transformers).",
    "This component is responsible for producing coherent and contextually appropriate language outputs based on a mixture of input prompts and additional information fetched by the retrieval component.",
    "Complementing the language model is the retrieval system, which is usually built on a database of documents or a corpus of texts.",
    "This system uses techniques from information retrieval to find and fetch documents that are relevant to the input query or prompt.",
    "The mechanism of relevance determination can range from simple keyword matching to more complex semantic search algorithms which interpret the meaning behind the query to find the best matches.",
    "This component merges the outputs from the language model and the retrieval system.",
    "It effectively synthesizes the raw data fetched by the retrieval system into the generative process of the language model.",
    "The integrator ensures that the information from the retrieval system is seamlessly incorporated into the final text output, enhancing the model's ability to generate responses that are not only fluent and grammatically correct but also rich in factual details and context-specific nuances."
]

如您所见，文本中有很多段落，给定查询“tell me something about rag store”（告诉我一些关于RAG存储的信息），我们如何找到一个或多个段落作为查询的上下文？有许多方法可以实现，最简单的方法是检查查询与给定段落之间的共同词汇数量，共同词汇越多，相关性越高。但这种方法有问题，例如“river bank”（河岸）和“central bank”（中央银行），这两个文本共享一个共同词“bank”（银行），但“bank”在这两个句子中的含义完全不同。尽管后者与“金融机构”更相关，甚至没有与“中央银行”共享任何共同词汇，但现在我们尝试这种方法：

query = "tell me something about rag store"
# 我们如何从文档中提取适当的上下文
# retriever 的任务
# 一种简单的方法是检查两段文本的相关性 => 比较它们共享的共同词汇数量，
# (query, text1), => 共同词汇3个, (query, text2) => 共同词汇4个, 因此 text2 比 text1 更相关
# river bank, central bank

def similarity_by_matching_common_words(text1, text2):
  '''
  将两个输入文本分割成词，然后计算共同词汇数量
  '''
  text1_words = set(text1.lower().split())
  text2_words = set(text2.lower().split())
  common_words = text1_words.intersection(text2_words)
  return len(common_words)


def find_best_match_piece(query, db_records):
  best_score = 0
  best_record = 0
  for record in db_records:
    current_score = similarity_by_matching_common_words(query, record)
    if current_score > best_score:
      best_score = current_score
      best_record = record

  return best_score, best_record


best_score, best_record = find_best_match_piece(query, db_records)

print(f"best score:{best_score}, best record: {best_record}")

我们使用上面的代码检查查询与 db_records 数组中每个段落之间的共同词汇数量，运行以上代码我们得到以下结果：

best score:2, best record: A RAG vector store is a database or dataset that contains vectorized data points.

然后我们可以创建一个增强的查询，如下所示：

augmented_input = query + ":" + best_record
print_formatted_response(augmented_input)

然后我们可以使用增强的查询让LLM生成返回结果：

llm_response = call_llm_with_self_prompt(augmented_input)
print_formatted_response(llm_response)

上述代码的返回结果如下：


llm_response = call_llm_with_self_prompt(augmented_input)
print_formatted_response(llm_response)
generate answer from prompt:tell me something about rag store:A RAG vector store is a database or dataset that contains vectorized data points.
Response:
---------------
A RAG (Retrieval-Augmented Generation) vector store is a specialized database
designed to store and manage vectorized data points. These data points are
typically high-dimensional vectors that represent various forms of information,
such as text, images, or other types of data that have been transformed into a
numerical format suitable for machine learning and retrieval tasks.  Here are
some key aspects of a RAG vector store:  1. **Vector Representation**: In a RAG
vector store, data is stored in the form of vectors. These vectors are often
generated using techniques such as word embeddings (e.g., Word2Vec, GloVe),
sentence embeddings (e.g., BERT, Sentence-BERT), or other forms of feature
extraction that convert raw data into a numerical format.  2. **Efficient
Retrieval**: One of the primary purposes of a RAG vector store is to enable
efficient retrieval of relevant data points based on similarity measures. This
is typically achieved using techniques such as nearest neighbor search, which
allows for quick identification of vectors that are similar to a given query
vector.  3. **Augmented Generation**: In the context of RAG, the vector store is
used to augment the generation process. For example, in natural language
processing (NLP) tasks, a RAG model might retrieve relevant documents or
passages from the vector store to provide context or additional information that
can be used to generate more accurate and informative responses.  4.
**Scalability**: RAG vector stores are designed to handle large volumes of data
and support scalable retrieval operations. This often involves the use of
specialized data structures and indexing techniques, such as KD-trees, ball
trees, or approximate nearest neighbor (ANN) algorithms like HNSW (Hierarchical
Navigable Small World) graphs.  5. **Integration with Machine Learning Models**:
RAG vector stores are typically integrated with machine learning models that
perform the retrieval and generation tasks. For example, a RAG model might use a
transformer-based architecture for text generation, while relying on the vector
store to retrieve relevant context or knowledge.  6. **Applications**: RAG
vector stores have a wide range of applications, including question answering
systems, chatbots, recommendation systems, and any other scenario where it is
beneficial to retrieve and utilize relevant information to enhance the
generation process.  In summary, a RAG vector store is a crucial component in
systems that combine retrieval and generation tasks, enabling efficient and
scalable access to vectorized data points to support various machine learning
and information retrieval applications.
---------------