OpenAI 双语文档参考 Embeddings

OpenAI的文本嵌入模型用于衡量字符串的相关性,常用于搜索、聚类、推荐、异常检测和分类等任务。嵌入是浮点数向量,距离表示相关性。推荐使用text-embedding-ada-002模型。文章展示了如何获取嵌入、实际应用案例及不同场景的代码示例,同时提醒注意模型的局限性和风险,如社会偏见和对近期事件的无知。

Embeddings

What are embeddings? 什么是嵌入?

OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are commonly used for:
OpenAI 的文本嵌入衡量文本字符串的相关性。嵌入通常用于:

  • Search (where results are ranked by relevance to a query string)
    搜索(结果按与查询字符串的相关性排序)
  • Clustering (where text strings are grouped by similarity)
    聚类(其中文本字符串按相似性分组)
  • Recommendations (where items with related text strings are recommended)
    推荐(推荐具有相关文本字符串的项目)
  • Anomaly detection (where outliers with little relatedness are identified)
    异常检测(识别出相关性很小的异常值)
  • Diversity measurement (where similarity distributions are analyzed)
    多样性测量(分析相似性分布)
  • Classification (where text strings are classified by their most similar label)
    分类(其中文本字符串按其最相似的标签分类)

An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
嵌入是浮点数的向量(列表)。两个向量之间的距离衡量它们的相关性。小距离表示高相关性,大距离表示低相关性。

Visit our pricing page to learn about Embeddings pricing. Requests are billed based on the number of tokens in the input sent.
访问我们的定价页面以了解嵌入定价。请求根据发送的输入中的令牌数量计费。

**To see embeddings in action, check out our code samples

要查看嵌入的实际效果,请查看我们的代码示例**

  • Classification
  • Topic clustering
  • Search
  • Recommendations

Browse Samples‍

How to get embeddings 如何获得嵌入

To get an embedding, send your text string to the embeddings API endpoint along with a choice of embedding model ID (e.g., text-embedding-ada-002). The response will contain an embedding, which you can extract, save, and use.
要获得嵌入,请将您的文本字符串连同选择的嵌入模型 ID(例如 text-embedding-ada-002 )一起发送到嵌入 API 端点。响应将包含一个嵌入,您可以提取、保存和使用它。

Example requests:

Example: Getting embeddings 示例:获取嵌入

python

response = openai.Embedding.create(
    input="Your text string goes here",
    model="text-embedding-ada-002"
)
embeddings = response['data'][0]['embedding']

Example response:

{
   
   
  "data": [
    {
   
   
      "embedding": [
        -0.006929283495992422,
        -0.005336422007530928,
        ...
        -4.547132266452536e-05,
        -0.024047505110502243
      ],
      "index": 0,
      "object": "embedding"
    }
  ],
  "model": "text-embedding-ada-002",
  "object": "list",
  "usage": {
   
   
    "prompt_tokens": 5,
    "total_tokens": 5
  }
}

See more Python code examples in the OpenAI Cookbook.
在 OpenAI Cookbook 中查看更多 Python 代码示例。

When using OpenAI embeddings, please keep in mind their limitations and risks.
使用 OpenAI 嵌入时,请牢记它们的局限性和风险。

Embedding models

OpenAI offers one second-generation embedding model (denoted by -002 in the model ID) and 16 first-generation models (denoted by -001 in the model ID).
OpenAI 提供了一个二代嵌入模型(在模型 ID 中用 -002 表示)和 16 个第一代模型(在模型 ID 中用 -001 表示)。

We recommend using text-embedding-ada-002 for nearly all use cases. It’s better, cheaper, and simpler to use. Read the blog post announcement.
我们建议对几乎所有用例使用 text-embedding-ada-002。它更好、更便宜、更易于使用。阅读博文公告。

MODEL GENERATION TOKENIZER MAX INPUT TOKENS 最大输入代币 KNOWLEDGE CUTOFF
V2 cl100k_base 8191 Sep 2021
V1 GPT-2/GPT-3 2046 Aug 2020

Usage is priced per input token, at a rate of $0.0004 per 1000 tokens, or about ~3,000 pages per US dollar (assuming ~800 tokens per page):
使用量按输入令牌定价,每 1000 个令牌 0.0004 美元,或每美元约 3,000 页(假设每页约 800 个令牌):

MODEL ROUGH PAGES PER DOLLAR 每美元粗略页数 EXAMPLE PERFORMANCE ON BEIR SEARCH EVAL BEIR SEARCH EVAL 的性能示例
text-embedding-ada-002 文本嵌入-ada-002 3000 53.9
-davinci--001 6 52.8
-curie--001 60 50.9
-babbage--001 240 50.4
-ada--001 300 49.0

Second-generation models 二代机型

MODEL NAME TOKENIZER MAX INPUT TOKENS 最大输入代币 OUTPUT DIMENSIONS
text-embedding-ada-002 文本嵌入-ada-002 cl100k_base 8191 1536

First-generation models (not recommended)
第一代机型(不推荐)

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值