label-studio 使用机器学习后端进行预标注

原创已于 2025-11-19 15:58:15 修改 · 940 阅读

19 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #人工智能

于 2025-11-19 14:43:56 首次发布

部署运行你感兴趣的模型镜像

使用label-studio 进行数据标注，可以多人协作，标注人员无需安装额外标注工具，无需将原图片拷贝到标注人员那，比较方面。本次使用label-studio 版本 1.20.0,label-studio-sdk 版本1.0.18

1 首先，生成lagacy token ,点击organization ,点击右侧api token settings ,使能legacy tokens ,后保存，

然后在个人页面的右上角 Account &Settings 最后可以看到legacy token

legacy token 是使用机器学习后端的前提，同时说明下，legacy token 不需要收费也可以获得，

然后添加环境变量 vim ~/.bashrc

export LABEL_STUDIO_URL=***

export LABEL_STUDIO_API_KEY=legacy token

这里的api key 即是legacy token

source ~/.bashrc #更新环境变量

2 克隆机器学习后端项目

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/label_studio_ml/examples/{MODEL_NAME}
docker-compose up

MODEL_NAME 为需要使用的模型名字，这里有很多，比如segment_anything_2_image、segment_anything_2_video、timeseries_segmenter、yolo ，因为使用目标检测，这里选择yolo

请注意这里启动的是默认的yolo 模型，如果要使用个性化训练的yolo ,需要继续向下

3 使用个性化训练的模型进行预标注

（1）创建虚拟环境（这里使用conda ）

git clone https://github.com/HumanSignal/label-studio-ml-backend.git
cd label-studio-ml-backend/
pip install -e .

安装上对应的环境，requirements.txt 最后的包是label-studio-sdk @ git+https://github.com/HumanSignal/label-studio-sdk.git，如果此步安装比较慢或报错，可以使用abel-studio-sdk @ git+https://kkgithub.com/HumanSignal/label-studio-sdk.git 试试，成功率会大些，安装后label-studio-ml 版本2.0.1.dev0

（2）创建机器学习后端

label-studio-ml create my_ml_backend

名字可以根据需要进行更改，这里保留默认名字，回车后，输出

=========================
Welcome to Label Studio ML!
==========================
You don't specify script path: by default, "/***/label-studio-ml-backend-master/label_studio_ml/default_configs/model.py" is used
Congratulations! ML Backend has been successfully initialized in ./my_ml_backend

Here are the next steps:

1. Try it out by running:
label-studio-ml start ./my_ml_backend
You should be able to connect to it in Label Studio project Settings > Machine Learning > Add Model and provide with the following URL: http://localhost:9090

2. Go to ./my_ml_backend/model.py and modify it as you wish:
- predict() - define your prediction logic here
- fit() - define your training logic here (optional)

3. Deploy your model with docker:
cd ./my_ml_backend
docker-compose up

4. Have fun! :)

可以看到，机器学习后端默认地址为http://localhost:9090，同时在项目里链接上这个模型地址，再者需要更改 ./my_ml_backend/model.py 的perdict() 函数来进行适配，先看下默认的perdict() 函数是什么

class NewModel(LabelStudioMLBase):
    """Custom ML Backend model
    """
    
    def setup(self):
        """Configure any parameters of your model here
        """
        self.set("model_version", "0.0.1")

    def predict(self, tasks: List[Dict], context: Optional[Dict] = None, **kwargs) -> ModelResponse:
        """ Write your inference logic here
            :param tasks: [Label Studio tasks in JSON format](https://labelstud.io/guide/task_format.html)
            :param context: [Label Studio context in JSON format](https://labelstud.io/guide/ml_create#Implement-prediction-logic)
            :return model_response
                ModelResponse(predictions=predictions) with
                predictions: [Predictions array in JSON format](https://labelstud.io/guide/export.html#Label-Studio-JSON-format-of-annotated-tasks)
        """
        print(f'''\
        Run prediction on {tasks}
        Received context: {context}
        Project ID: {self.project_id}
        Label config: {self.label_config}
        Parsed JSON Label config: {self.parsed_label_config}
        Extra params: {self.extra_params}''')

可以看到，创建了一个类，NewModel，继承于LabelStudioMLBase，setupmodel 是准备这个类，predict 是使用model 进行预测，请注意这里要转化为label-studio 支持的json 格式，json 格式示例如下

{
"id": 1,
"created_at":"2021-03-09T21:52:49.513742Z",
"updated_at":"2021-03-09T22:16:08.746926Z",
"project":83,

"data": {
"image": "https://example.com/opensource/label-studio/1.jpg"
},

"annotations": [
{
"id": "1001",
"result": [
{
"from_name": "tag",
"id": "Dx_aB91ISN",
"source": "$image",
"to_name": "img",
"type": "rectanglelabels",
"value": {
"height": 10.458911419423693,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 12.4,
"x": 50.8,
"y": 5.869797225186766
}
}
],
"was_cancelled":false,
"ground_truth":false,
"created_at":"2021-03-09T22:16:08.728353Z",
"updated_at":"2021-03-09T22:16:08.728378Z",
"lead_time":4.288,
"result_count":0,
"task":1,
"completed_by":10
}
],

"predictions": [
{
"created_ago": "3 hours",
"model_version": "model 1",
"result": [
{
"from_name": "tag",
"id": "t5sp3TyXPo",
"source": "$image",
"to_name": "img",
"type": "rectanglelabels",
"value": {
"height": 11.612284069097889,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 39.6,
"x": 13.2,
"y": 34.702495201535505
}
}
]
},
{
"created_ago": "4 hours",
"model_version": "model 2",
"result": [
{
"from_name": "tag",
"id": "t5sp3TyXPo",
"source": "$image",
"to_name": "img",
"type": "rectanglelabels",
"value": {
"height": 33.61228406909789,
"rectanglelabels": [
"Moonwalker"
],
"rotation": 0,
"width": 39.6,
"x": 13.2,
"y": 54.702495201535505
}
}
]
}
]
}

task 信息主要包含3个信息：基本信息、标注信息（annotations）和预测信息（predictions），重点关注predictions，predictions 的result 存放标注结果，是一个列表，我们目前主要使用这个task 的信息，修改setup 和predict 完成predict 返回要求的json 格式。

修改后的NewModel 如下

from typing import List, Dict, Optional
from label_studio_ml.model import LabelStudioMLBase
from label_studio_ml.response import ModelResponse

from label_studio_ml.utils import  get_single_tag_keys, get_image_local_path
from label_studio_ml.api import  logger
from ultralytics import YOLO

class NewModel(LabelStudioMLBase):
    """Custom ML Backend model
    """
    
    def setup(self):
        """Configure any parameters of your model here
        """
        self.set("model_version", "0.0.1")
        self.detector= YOLO("/***/best.pt",) #yolo v11n,已训练好的模型权重

        self.from_name, self.to_name, self.value, self.labels_in_config =get_single_tag_keys(self.parsed_label_config,
                            'RectangleLabels', 'Image')  # 前端获取任务属性

        self.labels_in_config = set(self.labels_in_config)  # 前端配置的标签列表


    def predict(self, tasks: List[Dict], context: Optional[Dict] = None, **kwargs) -> ModelResponse:
        """ Write your inference logic here
            :param tasks: [Label Studio tasks in JSON format](https://labelstud.io/guide/task_format.html)
            :param context: [Label Studio context in JSON format](https://labelstud.io/guide/ml_create#Implement-prediction-logic)
            :return model_response
                ModelResponse(predictions=predictions) with
                predictions: [Predictions array in JSON format](https://labelstud.io/guide/export.html#Label-Studio-JSON-format-of-annotated-tasks)
        """
        print(f'''\
        Run prediction on {tasks}
        Received context: {context}
        Project ID: {self.project_id}
        Label config: {self.label_config}
        Parsed JSON Label config: {self.parsed_label_config}
        Extra params: {self.extra_params}''')
        predictions = {
            # "model_version": self.get("model_version"),
            "result": [],
            "scores": 0
        }
        average_conf=0
        for task in tasks:
            image_path = self.get_local_path(task['data']['image'], task_id=task['id']) #获取图片路径
           
            results=self.detector.predict(image_path,conf=0.5)
            for result in results: #yolo 检测后的结果
                #获取对应信息
                confs = result.boxes.conf.cpu().numpy().tolist()
                for index in range(len(result.boxes.cls.cpu().numpy().tolist())): #index 是序号
                    index_=int(result.boxes.cls.cpu().numpy().tolist()[index]) #index_ 是目标id
                    label=result.names[index_] #获取实际标签
                    if label not in self.labels_in_config: #不在这个项目，则跳过
                        continue

                    bbox=result.boxes.xyxyn.cpu().numpy().tolist()[index] #获取坐标

                    conf=confs[index] #获取置信度
                    img_height,img_width,  = result.orig_shape #(1080, 1920)

                    prediction={
                            # "id": task['id'],
                            'from_name': self.from_name,
                            'to_name': self.to_name,
                            # "source": "$image",
                            "original_width": img_width,
                            "original_height": img_height,
                            "image_rotation": 0,
                            'type': 'rectanglelabels',
                            "value":{
                                'rectanglelabels': [label],
                                'x': round(bbox[0]  * 100,2),	# 坐标需要转换
                                'y': round(bbox[1]  * 100,2),	# 数值类型返回整型
                                'width': round((bbox[2]-bbox[0])  * 100,2),
                                'height': round((bbox[3]-bbox[1])  * 100,2),
                                "rotation": 0
                            }
                   }

                    predictions["result"].append(prediction)
                predictions["score"] = sum(confs)/ max(len(confs), 1)

        return ModelResponse(predictions=[predictions])

说明下，这里只保留xywhn,rotation 设置为0，同时ModelResponse这个类的参数predictions 是一个列表

（3）开启机器学习后端

这里使用直接开启的方式进行开启

label-studio-ml start my_ml_backend

服务开起来之后进行检查

curl http://127.0.0.1:9090/health
{"model_class":"NewModel","status":"UP"}

也可以使用自建项目目录的test_api.py 进行测试，不过这个测试没有通过。大致看了下，这个接口主要包含/webhock,/setup,/predict ，最后一个是结果预测使用的路径
（4）配置需要标注的项目

添加model :settings-model ，声明一个名字，再添加url 地址，如果url地址是通的保存即可

annotation 设置，如下图所示prelabeling ，需要使能，同时下面模型选择添加的模型，保存即可