揭秘Hugging Face端点：构建ML应用的强大工具

最新推荐文章于 2025-05-20 18:44:14 发布

dfvcbipanjr

最新推荐文章于 2025-05-20 18:44:14 发布

阅读量602

点赞数 6

文章标签： python

本文链接：https://blog.youkuaiyun.com/dfvcbipanjr/article/details/144320281

版权

引言

在当今的机器学习领域，Hugging Face Hub以其提供的丰富资源和社区合作功能而闻名。在这里，你可以找到超过120,000个模型、20,000个数据集和50,000个演示应用程序，这些都是开源并且公开可用的。而在这些资源之外，Hugging Face还提供了各种端点（Endpoints），支持开发者轻松创建ML应用程序。本篇文章将介绍如何连接和使用这些端点，特别是用于文本生成推理的自定义构建服务器。

主要内容

安装和设置

要使用Hugging Face的端点，首先需要安装huggingface_hub Python包：

%pip install --upgrade --quiet huggingface_hub

为了进行API调用，你需要获取一个API令牌。可以通过以下链接获得：获取API Token。

from getpass import getpass

HUGGINGFACEHUB_API_TOKEN = getpass()
import os
os.environ["HUGGINGFACEHUB_API_TOKEN"] = HUGGINGFACEHUB_API_TOKEN

连接到Hugging Face端点

在这个部分，我们将展示如何连接到Hugging Face提供的免费无服务器端点以及专用端点。

免费无服务器端点

无服务器端点可以快速实现和迭代解决方案，但在高负载的情况下可能受到速率限制。

from langchain_huggingface import HuggingFaceEndpoint
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

question = "Who won the FIFA World Cup in the year 1994?"
template = """Question: {question}\n\nAnswer: Let's think step by step."""
prompt = PromptTemplate.from_template(template)

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"
llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_length=128,
    temperature=0.5,
    huggingfacehub_api_token=HUGGINGFACEHUB_API_TOKEN,
)
llm_chain = prompt | llm
print(llm_chain.invoke({"question": question}))

专用端点

对于企业级工作负载，建议使用专用的推理端点。这些端点提供更高的灵活性和速度，并且具有持续的支持和正常运行时间保证。

your_endpoint_url = "{AI_URL}"  # 使用API代理服务提高访问稳定性
llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
)
llm("What did foo say about bar?")

流式传输支持

Hugging Face还支持流式传输，这使得实时应用变得更为简单。

from langchain_core.callbacks import StreamingStdOutCallbackHandler

llm = HuggingFaceEndpoint(
    endpoint_url=f"{your_endpoint_url}",  # 使用API代理服务提高访问稳定性
    max_new_tokens=512,
    top_k=10,
    top_p=0.95,
    typical_p=0.95,
    temperature=0.01,
    repetition_penalty=1.03,
    streaming=True,
)
llm("What did foo say about bar?", callbacks=[StreamingStdOutCallbackHandler()])