talk2car:
用python实现逻辑,要求代码简洁清晰,有一个expr_train.json文件,每个键对应train_commands.json的"command_token"字段的值,其中一条数据是
“4175173f5f60d19ecfc3712e960a1103”: {
“obj_box”: [
529,
458,
24,
41
],
“class”: “human.pedestrian.adult”,
“img”: “train_0.jpg”,
“action”: “standing”,
“color”: “black”,
“location”: “left”,
“description”: “first adult on left”
},
还有一个train_commands.json文件,格式是:
{
“commands”: [
{
“scene_token”: “f92422ed4b4e427194a4958ccf15709a”,
“sample_token”: “c32d636e44604d77a1734386b3fe4a0d”,
“translation”: [
-13.49250542687401,
0.43033061594724364,
59.28095610405408
],
“size”: [
0.81,
0.73,
1.959
],
“rotation”: [
“-0.38666213835670615”,
“-0.38076281276237284”,
“-0.5922192111910205”,
“0.5956412318459762”
],
“command”: “turn left to pick up the pedestrian at the corner”,
“obj_name”: “human.pedestrian.adult”,
“box_token”: “0183ed8a474f411f8a3394eb78df7838”,
“command_token”: “4175173f5f60d19ecfc3712e960a1103”,
“2d_box”: [
528,
457,
26,
43
],
“t2c_img”: “img_train_0.jpg”
},
…
}
现在需要将expr_train.json文件每条数据形成一个QA对,如:
{
“id”: 0,
“image”: “obs://yw-2030-gy/data/opensource/talk2car/imgs/img_train_0.jpg”, #“t2c_img"前视图路径
“width”: XXX,
“height”: XXX,
“conver
sations”: [
{
“from”: “human”,
“value”: “\nTurn left to pick up the pedestrian at the corner,please output the bounding box coordinates of the object referred to in this command with the attributes: “the action is standing, the color is black, the location is left”.”
#取"command”、“action”、“color”、"location"值
},
{
“from”: “gpt”,
“value”: “first adult on left[[529, 458, 24, 41]]” #待归一化,取"obj_box"值,obj_box本身格式为[x,y,w,h],ref取"description"的值
}
]
}
其中image字段取obs://talk2car/imgs/img_和"t2c_img"字段拼接,需要在obs中检测下是否存在,检测用moxing的方法,
width和height取/talk2car/imgs/img_和"t2c_img"字段拼接路径图片的宽和高
human的value值,取\n拼接"command"的值再拼接, please output the bounding box coordinates of the object referred to in this command with the attributes: “the action is {action}, the color is {color}, the location is {location}”.
gpt的value值,取first adult on left[[xxx, xxx, xx, xx]],box取"obj_box"转换为x1, y1, x2, y2后经过归一化的值,归一化代码:
def normalize_coordinates(box, image_width, image_height):
x1, y1, x2, y2 = box
normalized_box = [
round((x1 / image_width) * 1000),
round((y1 / image_height) * 1000),
round((x2 / image_width) * 1000),
round((y2 / image_height) * 1000)
]
return normalized_box,把每一条结果输出到jsonl里,加上这个逻辑后给我一版新的完整代码