利用Open AI SDK批量图文推理/标注

apd_csdn

已于 2025-05-15 17:35:13 修改

阅读量182

点赞数 5

文章标签：人工智能语言模型算法 python

于 2025-05-15 17:30:15 首次发布

本文链接：https://blog.youkuaiyun.com/apd_csdn/article/details/147987425

版权

利用Open AI SDK批量图文推理/标注

现在多种主流模型（gpt、claude、豆包等）支持多模态输入，有些场景需要批量调用图文推理来进行大规模数据标注等任务，以下给出并发调用的代码。

安装环境

pip install openai requests

导入库

import asyncio
import nest_asyncio
from openai import AsyncOpenAI
import time
import base64
from PIL import Image
import random
import os

其中asyncio ，AsyncOpenAI和nest_asyncio 用于异步图文推理。

定义异步调用对象

请将准备好的模型url和密钥填入。nest_asyncio.apply()允许该对象在其他异步loop中使用。

class async_api:
    def __init__(self) -> None:
        nest_asyncio.apply()    
        self.aclient = AsyncOpenAI(
            base_url="https://xxx/api/v1", # 替换为你的 base_url
            api_key=""  # 替换为你的 API 密钥
        )

异步推理函数

该函数要加上关键字async，允许异步执行。输入是一个query字符串和一个PIL.Image对象。我们将图片文件缓存后重新读取，得到PNG文件的base64编码。注意需要将所使用模型的名称填入model="xxx", # 替换为你的 model_name

async def async_query_openai(self,query,image:Image):

    # convert image to base64 with png format
    if image == None:
        return 'No Input'
    tmp_image_name = str(random.randint(100000,999999))+'tmp.png'
    image.save(tmp_image_name)
    with open(tmp_image_name, "rb") as image_file:
        image = image_file.read()
    base64_image = base64.b64encode(image).decode('utf-8')
    os.remove(tmp_image_name)
    completion = await self.aclient.chat.completions.create(
        model="xxx",    # 替换为你的 model_name
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": query
                    },
                    {
                        "type": "image_url",
                        "image_url": {
                            "url": f"data:image/png;base64,{base64_image}"
                        }
                    }
                ]
            }
        ],
        temperature=0.5,
        top_p=0.9,
        max_tokens=512
    )
    return completion.choices[0].message.content

对列表内容执行异步处理

这段函数用于处理一组请求并返回所有请求的结果。它接收一个请求列表 queries，每个请求包含两个元素。函数通过 asyncio.gather 并发执行所有请求，并将结果存储在 results 列表中，最后返回该列表。

async_process_queries 是一个异步函数，用于并发处理多个请求。
process_batch 是一个同步函数，用于处理一批文本和图像数据，并调用异步函数 async_process_queries 进行处理。
asyncio.gather 用于并发执行多个异步任务。
asyncio.get_event_loop 用于获取当前事件循环，并运行异步任务直到完成。

# 这个函数接收一个请求列表，返回所有请求的结果列表
async def async_process_queries(self,queries):
    results = await asyncio.gather(*(self.async_query_openai(query[0],query[1]) for query in queries))
    return results


def process_batch(self,batch_title:list,batch_img:list):
    ''' Process a batch of text and image prompt. Image element should be PIL.Image  
    '''
    # 修补 asyncio
    nest_asyncio.apply()
    if len(batch_title)!=len(batch_img):
        raise Exception('Feedback titles can not match images one-by-one')
    combined_data = list(zip(batch_title, batch_img))

    loop = asyncio.get_event_loop()
    results = loop.run_until_complete(self.async_process_queries(combined_data))
    return results

调用示例

首先实例化async_api对象，然后将列表数据传入process_batch。注意图文一一对应，如果一文多图，可能需要参考所用模型api调用的具体说明。所有推理都生成完毕后，才会返回结果。

myapi = async_api()
image = Image.open('./images1/0001.png')
image2 = Image.open('./images1/0002.jpeg')
out1 = myapi.process_batch([
	'请对图片内容进行分类，可选类别有xx,xxx,x。请直接回答类别。',
	'请对图中的车辆进行目标检测，最多5个，以Pascal VOC格式（xml）返回。请直接返回xml数据。'
],
[image,image2])
print(out1)

并发推理速度要远大于顺序推理，为构建大规模数据集提供可能。

完整代码

import asyncio
import nest_asyncio
from openai import AsyncOpenAI
import time
import base64
from PIL import Image
import random
import os
 
# 这个函数处理单个请求，返回单个结果
class async_api:
    def __init__(self) -> None:
        nest_asyncio.apply()    
        self.aclient = AsyncOpenAI(
            base_url="https://xxx/api/v1", # 替换为你的 base_url
            api_key=""  # 替换为你的 API 密钥
        )
    async def async_query_openai(self,query,image:Image):
 
        # convert image to base64 with png format
        if image == None:
            return 'No Input'
        tmp_image_name = str(random.randint(100000,999999))+'tmp.png'
        image.save(tmp_image_name)
        with open(tmp_image_name, "rb") as image_file:
            image = image_file.read()
        base64_image = base64.b64encode(image).decode('utf-8')
        os.remove(tmp_image_name)
        completion = await self.aclient.chat.completions.create(
            model="xxx",    # 替换为你的 model_name
            messages=[
                {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": query
                        },
                        {
                            "type": "image_url",
                            "image_url": {
                                "url": f"data:image/png;base64,{base64_image}"
                            }
                        }
                    ]
                }
            ],
            temperature=0.5,
            top_p=0.9,
            max_tokens=512
        )
        return completion.choices[0].message.content  
    
    # 这个函数接收一个请求列表，返回所有请求的结果列表
    async def async_process_queries(self,queries):
        results = await asyncio.gather(*(self.async_query_openai(query[0],query[1]) for query in queries))
        return results
    
    
    def process_batch(self,batch_title:list,batch_img:list):
        ''' Process a batch of text and image prompt. Image element should be PIL.Image  
        '''
        # 修补 asyncio
        nest_asyncio.apply()
        if len(batch_title)!=len(batch_img):
            raise Exception('Feedback titles can not match images one-by-one')
        combined_data = list(zip(batch_title, batch_img))
 
        loop = asyncio.get_event_loop()
        results = loop.run_until_complete(self.async_process_queries(combined_data))
        return results
 
 
if __name__=='__main__':
	myapi = async_api()
	image = Image.open('./images1/0001.png')
	image2 = Image.open('./images1/0002.png')
	out1 = myapi.process_batch(['请对图片内容进行分类，可选类别有xx,xxx,x。请直接回答类别。','请对图中的车辆进行目标检测，最多5个，以Pascal VOC格式（xml）返回。请直接返回xml数据。'],[image,image2])
	print(out1)