0723_错误和异常

错误和异常

错误

  • 语法错误(Syntax error)

  • 逻辑错误(logic error)

  • 执行期间错误(runtime error)

demo

for i in range(10)
    print(i)

out —>

    for i in range(10)
                     ^
SyntaxError: invalid syntax

python 的语法分析器完成,检测到错误所在文件和行号。以向上箭头标记错误位置。最后显示错误类型。

程序检测到一个错误,解释器就无法继续执行下去,抛出异常,终止程序。


ZeroDivisionError: division by zero
b
NameError: name 'b' is not defined
    
for i in range(10)
   print (i)

    for i in range(10)
                     ^
SyntaxError: invalid syntax
      
        
   a=10

    a=10
    ^
IndentationError: unexpected indent

异常

系统根据不同的错误,抛出不同的异常

常见异常:

异常描述
NameError尝试访问一个没有申明的变量
ZeroDivisionError除数为 0
SyntaxError语法错误
IndexError索引超出序列范围
KeyError请求一个不存在的字典关键字
FileNotFoundError文件未发现错误(比如你要读的文件不存在)
AttributeError尝试访问未知的对象属性
ModuleNotFoundError模块未发现
IndentationError缩进

异常处理

程序一旦发生错误,程序就无法继续运行

为了使程序健壮,可做相关异常处理

try:
    try_statements
except [exceptionType [as identifier]]:
    except_statements
[else:
    else_statements]

[finally:
     finally_statements]

  • try子句
    • try … except 必须放在可能发生异常的程序段周围,而try_statements则是可能发生异常的程序段。
  • except子句
    • 用来捕捉指定的异常,一旦捕捉到,就执行与之对应的except_statements,即用来处理异常的程序语句。
    • 如果要针对不同的异常做不同的处理,可以使用多个except子句,其中,exceptionType是欲捕捉的异常类型,省略不写,表示为预设类型:BaseException,所有异常都继承自该类别。
    • [as identifier]可以将捕捉到的异常指向一个变量,然后,通过该变量获得异常相关的信息。
    • 不带任何异常类型使用except,将捕获所有发生的异常。不推荐这么使用,因为我们不能通过该程序识别出具体的异常信息。
  • else子句
    • 当try_statements没有异常发生时,会跳过except子句,执行else_statements。
    • 该子句为可选语句,可以指定或者省略。
  • finally子句
    • 当要离开try … except 时(无论异常是否发生),就会执行finally_statements,可以使清除错误或者收尾的语句。可给可忽略。
try:
    x = eval(input('请输入被除数:\t'))
    y = eval(input('请输入除数:\t'))
    z = x / y
except ZeroDivisionError:
    print('除数不可以为0')
except NameError:
    print('请检查变量是否赋值')
except Exception as e:
    print(e.args)
else:
    print('为捕捉到异常,x/y=', z)
finally:
    print('离开try... except 模块')

raise(触发异常)

除了系统抛出的异常,我们可以用raise语句自己触发异常

格式:

raise [Exception [, args [, traceback]]]
  • Exception:异常类型
  • args:我们自己提供的异常参数
  • traceback:可选,如果存在,跟踪异常对象
try:
    raise NameError('Sorry, Error occurs')
except NameError:
    print('捕捉到异常')

assert(断言)

assert conditon

逻辑上相当于:

if not conditon:
    raise AssertionError()

为断言添加一个异常参数

assert expression [, args]
li = [1, 2]
assert len(li) >= 5, '列表元素个数小于5'

'''
Traceback (most recent call last):
  File "E:/PyCharm/Course/0723.py", line 24, in <module>
    assert len(li) >= 5, '列表元素个数小于5'
AssertionError: 列表元素个数小于5
'''
import json import os from collections import defaultdict import jsonlines def process_collision_data(input_file, output_file, stats_output_file): """ 处理碰撞数据,按scene_id相机类型分组,生成mask标签 并按scene_id、cam_tagtimestamp降序输出,同时生成统计信息 :param input_file: 输入JSONL文件路径 :param output_file: 输出JSONL文件路径 :param stats_output_file: 统计信息输出JSON文件路径 """ # 读取并解析JSONL文件 data = [] with open(input_file, 'r', encoding='utf-8') as f: for line in f: data.append(json.loads(line)) # 按scene_id相机类型分组 grouped_data = defaultdict(lambda: defaultdict(list)) # 提取相机类型并分组 for item in data: scene_id = item['scene_id'] cam_type = item['image_path'].split('/')[-2] grouped_data[scene_id][cam_type].append(item) # 初始化统计数据结构 stats = { "total_scenes": len(grouped_data), "scenes": {}, "overall": { "total_frames": 0, "yes_count": 0, "no_count": 0, "yes_ratio": 0.0, "no_ratio": 0.0 } } # 处理每个分组 processed_data = [] for scene_id, cam_groups in grouped_data.items(): scene_stats = { "scene_id": scene_id, "total_cam_tags": len(cam_groups), "cam_tags": {}, "total_frames": 0, "yes_count": 0, "no_count": 0 } for cam_type, cam_data in cam_groups.items(): # 初始化相机类型统计 cam_stats = { "cam_tag": cam_type, "total_frames": len(cam_data), "yes_count": 0, "no_count": 0 } # 按时间戳降序排序(最新时间在前) cam_data.sort(key=lambda x: x["timestamp"], reverse=True) # 处理第一帧 if cam_data: if cam_data[0]['mop_pnc_info']: cam_data[0]['mop_pnc_info'][0]['tag'] = 'yes' cam_stats["yes_count"] += 1 prev_points = tuple(cam_data[0]['mop_pnc_info'][0]['roadPoints']) else: print(f"警告: scene_id={scene_id}, cam_type={cam_type} 的第一帧没有 mop_pnc_info 数据") prev_points = tuple() # 处理后续帧 for i in range(1, len(cam_data)): current_item = cam_data[i] next_item = cam_data[i - 1] if not current_item['mop_pnc_info'] or not next_item['mop_pnc_info']: print( f"警告: 缺少 mop_pnc_info 数据 (scene_id={scene_id}, cam_type={cam_type}, timestamp={current_item['timestamp']})") continue current_points = tuple(current_item['mop_pnc_info'][0]['roadPoints']) next_points = tuple(next_item['mop_pnc_info'][0]['roadPoints']) if current_points == next_points: current_item['mop_pnc_info'][0]['tag'] = next_item['mop_pnc_info'][0]['tag'] else: next_mask = next_item['mop_pnc_info'][0]['tag'] current_item['mop_pnc_info'][0]['tag'] = 'no' if next_mask == 'yes' else 'yes' # 更新统计 if current_item['mop_pnc_info'][0]['tag'] == 'yes': cam_stats["yes_count"] += 1 else: cam_stats["no_count"] += 1 prev_points = current_points # 计算相机类型占比 cam_stats["yes_ratio"] = cam_stats["yes_count"] / cam_stats["total_frames"] if cam_stats[ "total_frames"] > 0 else 0 cam_stats["no_ratio"] = cam_stats["no_count"] / cam_stats["total_frames"] if cam_stats[ "total_frames"] > 0 else 0 # 更新场景统计 scene_stats["cam_tags"][cam_type] = cam_stats scene_stats["total_frames"] += cam_stats["total_frames"] scene_stats["yes_count"] += cam_stats["yes_count"] scene_stats["no_count"] += cam_stats["no_count"] # 添加到处理后的数据 processed_data.extend(cam_data) # 计算场景占比 scene_stats["yes_ratio"] = scene_stats["yes_count"] / scene_stats["total_frames"] if scene_stats[ "total_frames"] > 0 else 0 scene_stats["no_ratio"] = scene_stats["no_count"] / scene_stats["total_frames"] if scene_stats[ "total_frames"] > 0 else 0 # 添加到全局统计 stats["scenes"][scene_id] = scene_stats stats["overall"]["total_frames"] += scene_stats["total_frames"] stats["overall"]["yes_count"] += scene_stats["yes_count"] stats["overall"]["no_count"] += scene_stats["no_count"] # 计算全局占比 if stats["overall"]["total_frames"] > 0: stats["overall"]["yes_ratio"] = stats["overall"]["yes_count"] / stats["overall"]["total_frames"] stats["overall"]["no_ratio"] = stats["overall"]["no_count"] / stats["overall"]["total_frames"] # 按scene_id、cam_tagtimestamp降序排序 processed_data.sort(key=lambda x: ( x['scene_id'], x[item['image_path'].split('/')[-2]], -x['timestamp'] )) # 写入输出文件 with jsonlines.open(output_file, 'w') as writer: for item in processed_data: writer.write(item) # 写入统计文件 with open(stats_output_file, 'w', encoding='utf-8') as f: json.dump(stats, f, indent=4, ensure_ascii=False) return stats # 使用示例 if __name__ == "__main__": stats = process_collision_data( "./task4/basicData_historyDepth_v2_img106k_250723__huanshi_6ver_4fish_large_model_900_792874_10frames.jsonl", "processed_106k_0723_1.jsonl", "collision_stats_0723_1.json" ) print(f"处理完成!统计信息已保存到 collision_stats.json") print(f"总场景数: {stats['total_scenes']}") print(f"总帧数: {stats['overall']['total_frames']}") print(f"YES占比: {stats['overall']['yes_ratio']:.2%}") print(f"NO占比: {stats['overall']['no_ratio']:.2%}")
07-29
import json import os from collections import defaultdict import jsonlines def process_collision_data(input_file, output_file, stats_output_file): """ 处理碰撞数据,按scene_id相机类型分组,生成mask标签 并按scene_id、cam_tagtimestamp降序输出,同时生成统计信息 :param input_file: 输入JSONL文件路径 :param output_file: 输出JSONL文件路径 :param stats_output_file: 统计信息输出JSON文件路径 """ # 读取并解析JSONL文件 data = [] with open(input_file, 'r', encoding='utf-8') as f: for line in f: try: item = json.loads(line) # 添加cam_tag字段到每个数据项 item['cam_tag'] = item['image_path'].split('/')[-2] data.append(item) except json.JSONDecodeError: print(f"JSON解析错误: {line}") continue # 按scene_id相机类型分组 grouped_data = defaultdict(lambda: defaultdict(list)) for item in data: grouped_data[item['scene_id']][item['cam_tag']].append(item) # 初始化统计数据结构 stats = { "total_scenes": len(grouped_data), "scenes": {}, "overall": { "total_frames": 0, "yes_count": 0, "no_count": 0, "yes_ratio": 0.0, "no_ratio": 0.0 } } # 处理每个分组 processed_data = [] for scene_id, cam_groups in grouped_data.items(): scene_stats = { "scene_id": scene_id, "total_cam_tags": len(cam_groups), "cam_tags": {}, "total_frames": 0, "yes_count": 0, "no_count": 0 } for cam_type, cam_data in cam_groups.items(): # 初始化相机类型统计 cam_stats = { "cam_tag": cam_type, "total_frames": len(cam_data), "yes_count": 0, "no_count": 0 } # 按时间戳降序排序(最新时间在前) cam_data.sort(key=lambda x: x["timestamp"], reverse=True) # 状态跟踪变量(修复问题:确保正确继承) prev_points = None prev_tag = None # 处理第一帧 if cam_data: if cam_data[0].get('mop_pnc_info'): # 第一帧固定为'yes' cam_data[0]['mop_pnc_info'][0]['tag'] = 'yes' cam_stats["yes_count"] += 1 prev_points = tuple(cam_data[0]['mop_pnc_info'][0].get('roadPoints', [])) prev_tag = 'yes' print(f"场景 {scene_id} 相机 {cam_type}:0 (最新) 标签设置为 'yes'") else: print(f"警告: scene_id={scene_id}, cam_type={cam_type} 的第一帧没有 mop_pnc_info 数据") prev_points = tuple() prev_tag = None # 处理第二帧(如果存在) if len(cam_data) > 1: current_item = cam_data[1] if current_item.get('mop_pnc_info'): current_points = tuple(current_item['mop_pnc_info'][0].get('roadPoints', [])) # 第二帧特殊处理:与第一帧比较 if current_points == prev_points: current_item['mop_pnc_info'][0]['tag'] = 'yes' print(f"场景 {scene_id} 相机 {cam_type}: 帧1 与帧0相同 -> 标签设置为 'yes'") else: current_item['mop_pnc_info'][0]['tag'] = 'no' print(f"场景 {scene_id} 相机 {cam_type}: 帧1 与帧0不同 -> 标签设置为 'no'") # 更新统计 if current_item['mop_pnc_info'][0]['tag'] == 'yes': cam_stats["yes_count"] += 1 else: cam_stats["no_count"] += 1 # 更新状态(仅用于下一帧比较) prev_points_for_next = current_points # 用于下一帧比较的点 prev_tag_for_next = current_item['mop_pnc_info'][0]['tag'] # 用于下一帧比较的标签 else: print(f"警告: scene_id={scene_id}, cam_type={cam_type} 的第二帧没有 mop_pnc_info 数据") prev_points_for_next = prev_points prev_tag_for_next = prev_tag # 处理第三帧及后续帧(修复问题:确保正确继承) for i in range(2, len(cam_data)): current_item = cam_data[i] if current_item.get('mop_pnc_info'): current_points = tuple(current_item['mop_pnc_info'][0].get('roadPoints', [])) # 后续帧处理:与前一帧比较(关键修复:使用前一帧的状态) if i == 2: # 第三帧特殊处理:与第二帧比较 if current_points == prev_points_for_next: current_item['mop_pnc_info'][0]['tag'] = prev_tag_for_next print( f"场景 {scene_id} 相机 {cam_type}: 帧{i} 与帧{i - 1}相同 -> 继承标签 '{prev_tag_for_next}'") else: # 反转前一帧标签 current_item['mop_pnc_info'][0]['tag'] = 'no' if prev_tag_for_next == 'yes' else 'yes' print( f"场景 {scene_id} 相机 {cam_type}: 帧{i} 与帧{i - 1}不同 -> 反转标签为 '{current_item['mop_pnc_info'][0]['tag']}'") else: # 第四帧及以后:与前一帧比较 if current_points == prev_points: current_item['mop_pnc_info'][0]['tag'] = prev_tag print( f"场景 {scene_id} 相机 {cam_type}: 帧{i} 与帧{i - 1}相同 -> 继承标签 '{prev_tag}'") else: # 反转前一帧标签 current_item['mop_pnc_info'][0]['tag'] = 'no' if prev_tag == 'yes' else 'yes' print( f"场景 {scene_id} 相机 {cam_type}: 帧{i} 与帧{i - 1}不同 -> 反转标签为 '{current_item['mop_pnc_info'][0]['tag']}'") # 更新统计 if current_item['mop_pnc_info'][0]['tag'] == 'yes': cam_stats["yes_count"] += 1 else: cam_stats["no_count"] += 1 # 更新状态(用于下一帧比较) prev_points = current_points prev_tag = current_item['mop_pnc_info'][0]['tag'] else: print(f"警告: scene_id={scene_id}, cam_type={cam_type} 的第{i + 1}帧没有 mop_pnc_info 数据") # 计算相机类型占比 if cam_stats["total_frames"] > 0: cam_stats["yes_ratio"] = cam_stats["yes_count"] / cam_stats["total_frames"] cam_stats["no_ratio"] = cam_stats["no_count"] / cam_stats["total_frames"] else: cam_stats["yes_ratio"] = 0.0 cam_stats["no_ratio"] = 0.0 # 更新场景统计 scene_stats["cam_tags"][cam_type] = cam_stats scene_stats["total_frames"] += cam_stats["total_frames"] scene_stats["yes_count"] += cam_stats["yes_count"] scene_stats["no_count"] += cam_stats["no_count"] # 添加到处理后的数据 processed_data.extend(cam_data) # 计算场景占比 if scene_stats["total_frames"] > 0: scene_stats["yes_ratio"] = scene_stats["yes_count"] / scene_stats["total_frames"] scene_stats["no_ratio"] = scene_stats["no_count"] / scene_stats["total_frames"] else: scene_stats["yes_ratio"] = 0.0 scene_stats["no_ratio"] = 0.0 # 添加到全局统计 stats["scenes"][scene_id] = scene_stats stats["overall"]["total_frames"] += scene_stats["total_frames"] stats["overall"]["yes_count"] += scene_stats["yes_count"] stats["overall"]["no_count"] += scene_stats["no_count"] # 计算全局占比 if stats["overall"]["total_frames"] > 0: stats["overall"]["yes_ratio"] = stats["overall"]["yes_count"] / stats["overall"]["total_frames"] stats["overall"]["no_ratio"] = stats["overall"]["no_count"] / stats["overall"]["total_frames"] else: stats["overall"]["yes_ratio"] = 0.0 stats["overall"]["no_ratio"] = 0.0 # 按scene_id、cam_tagtimestamp降序排序 processed_data.sort(key=lambda x: ( x['scene_id'], x['cam_tag'], -x['timestamp'] )) # 写入输出文件 with jsonlines.open(output_file, 'w') as writer: for item in processed_data: writer.write(item) # 写入统计文件 with open(stats_output_file, 'w', encoding='utf-8') as f: json.dump(stats, f, indent=4, ensure_ascii=False) return stats # 使用示例 if __name__ == "__main__": stats = process_collision_data( "./task4/basicData_historyDepth_v2_img106k_250723__huanshi_6ver_4fish_large_model_900_792874_10frames.jsonl", "processed_106k_0723_1.jsonl", "collision_stats_0723_1.json" ) print(f"处理完成!统计信息已保存到 collision_stats.json") print(f"总场景数: {stats['total_scenes']}") print(f"总帧数: {stats['overall']['total_frames']}") print(f"YES占比: {stats['overall']['yes_ratio']:.2%}") print(f"NO占比: {stats['overall']['no_ratio']:.2%}") 修改mask打标逻辑:每组数据排序后的第一个默认是yes,第二个判断时,取出第二条数据的roadpoint的值在第一个的数据中进行遍历,如果均在数据1 里则为yes,否则为no,每个场景下的每个相机,当mask从yes转变为no,则该组的后续数据全部为no
最新发布
07-30
import json import os from collections import defaultdict from typing import List, Dict, Tuple, Any def overlap_ratio(road_points_1: List[Dict[str, float]], road_points_2: List[Dict[str, float]]) -> float: """ 计算两条轨迹的重叠度,使用Jaccard相似度:交集 / 并集 """ if not road_points_1 or not road_points_2: return 0.0 # 创建点集(使用元组确保可哈希) set_1 = set((p['x'], p['y']) for p in road_points_1) set_2 = set((p['x'], p['y']) for p in road_points_2) intersection = len(set_1.intersection(set_2)) union = len(set_1.union(set_2)) return intersection / union if union != 0 else 0.0 def extract_group_key(image_path: str) -> str: """ 解析文件路径中的倒数第二个值作为分组键 """ try: parts = image_path.strip('/').split('/') if len(parts) < 2: return "default_group" return parts[-2] # 获取倒数第二个部分 except Exception as e: print(f"提取分组键错误: {e}, 路径: {image_path}") return "default_group" def process_collision_data( input_file: str, output_file: str, stats_file: str, threshold: float = 0.7, debug: bool = False ) -> Dict[str, Any]: """ 处理碰撞数据的核心函数 :param input_file: 输入JSONL文件路径 :param output_file: 输出JSONL文件路径 :param stats_file: 统计信息输出文件路径 :param threshold: 重叠度阈值,默认0.7 :param debug: 是否输出调试信息 :return: 统计信息字典 """ # 使用嵌套字典存储分组数据 grouped_data = defaultdict(lambda: defaultdict(list)) # 读取JSONL文件并解析 total_records = 0 print(f"开始读取输入文件: {input_file}") with open(input_file, 'r', encoding='utf-8') as f: for line_num, line in enumerate(f, 1): try: record = json.loads(line.strip()) sence_id = record.get('sence_id', 'unknown_sence') image_path = record.get('image_path', '') timestamp = record.get('timestamp', 0) # 获取道路点信息 mop_info = record.get('mop_info', {}) road_points = mop_info.get('roadPoints', []) if isinstance(mop_info, dict) else [] # 提取分组键 group_key = extract_group_key(image_path) # 存储记录 grouped_data[sence_id][group_key].append({ 'timestamp': timestamp, 'road_points': road_points, 'record': record }) total_records += 1 except Exception as e: print(f"处理记录错误 (行 {line_num}): {e}") print(f"成功读取 {total_records} 条记录,分组到 {len(grouped_data)} 个场景") # 初始化统计数据结构 stats = { "input_file": os.path.basename(input_file), "output_file": os.path.basename(output_file), "stats_file": os.path.basename(stats_file), "total_scenes": len(grouped_data), "scenes": {}, "overall": { "total_frames": 0, "yes_count": 0, "no_count": 0, "yes_ratio": 0.0, "no_ratio": 0.0 } } labeled_data = [] # 处理每个场景每个分组 for sence_id, groups in grouped_data.items(): scene_stats = { "scene_id": sence_id, "cam_tags": {}, "total_frames": 0, "yes_count": 0, "no_count": 0, "yes_ratio": 0.0, "no_ratio": 0.0 } for group_key, records in groups.items(): # 初始化相机统计 cam_stats = { "cam_tag": group_key, "total_frames": len(records), "yes_count": 0, "no_count": 0, "yes_ratio": 0.0, "no_ratio": 0.0 } # 按时间戳降序排列(最新时间在前) records.sort(key=lambda x: x['timestamp'], reverse=True) # 改进的标签生成逻辑 no_occurred = False # 标记是否已出现第一个no for idx, record_data in enumerate(records): current_record = record_data['record'] if idx == 0: # 第一帧(最新帧) current_record['mask'] = 'yes' cam_stats['yes_count'] += 1 if debug: print(f"场景 {sence_id} 相机 {group_key}:0 (最新) 标签设置为 'yes'") elif idx == 1: # 第二帧 # 与第一帧比较 overlap = overlap_ratio( records[0]['road_points'], record_data['road_points'] ) if overlap > threshold: current_record['mask'] = 'yes' cam_stats['yes_count'] += 1 if debug: print( f"场景 {sence_id} 相机 {group_key}: 帧1 与帧0重叠度 {overlap:.2f} > {threshold} -> 标签设置为 'yes'") else: current_record['mask'] = 'no' cam_stats['no_count'] += 1 no_occurred = True # 标记已出现第一个no if debug: print( f"场景 {sence_id} 相机 {group_key}: 帧1 与帧0重叠度 {overlap:.2f} <= {threshold} -> 标签设置为 'no' (第一个no出现)") else: # 第三帧及以后 if no_occurred: current_record['mask'] = 'no' cam_stats['no_count'] += 1 if debug: print(f"场景 {sence_id} 相机 {group_key}: 帧{idx} 分组中已出现no -> 标签设置为 'no'") else: # 与前一帧比较 overlap = overlap_ratio( records[idx - 1]['road_points'], record_data['road_points'] ) if overlap > threshold: current_record['mask'] = 'yes' cam_stats['yes_count'] += 1 if debug: print( f"场景 {sence_id} 相机 {group_key}: 帧{idx} 与帧{idx - 1}重叠度 {overlap:.2f} > {threshold} -> 标签设置为 'yes'") else: current_record['mask'] = 'no' cam_stats['no_count'] += 1 no_occurred = True # 标记已出现第一个no if debug: print( f"场景 {sence_id} 相机 {group_key}: 帧{idx} 与帧{idx - 1}重叠度 {overlap:.2f} <= {threshold} -> 标签设置为 'no' (第一个no出现)") # 计算相机统计比例 if cam_stats['total_frames'] > 0: cam_stats['yes_ratio'] = cam_stats['yes_count'] / cam_stats['total_frames'] cam_stats['no_ratio'] = cam_stats['no_count'] / cam_stats['total_frames'] # 更新场景统计 scene_stats['cam_tags'][group_key] = cam_stats scene_stats['total_frames'] += cam_stats['total_frames'] scene_stats['yes_count'] += cam_stats['yes_count'] scene_stats['no_count'] += cam_stats['no_count'] # 添加到标签化数据 for record_data in records: labeled_data.append(record_data['record']) # 计算场景统计比例 if scene_stats['total_frames'] > 0: scene_stats['yes_ratio'] = scene_stats['yes_count'] / scene_stats['total_frames'] scene_stats['no_ratio'] = scene_stats['no_count'] / scene_stats['total_frames'] # 添加到总体统计 stats['scenes'][sence_id] = scene_stats stats['overall']['total_frames'] += scene_stats['total_frames'] stats['overall']['yes_count'] += scene_stats['yes_count'] stats['overall']['no_count'] += scene_stats['no_count'] # 计算总体统计比例 if stats['overall']['total_frames'] > 0: stats['overall']['yes_ratio'] = stats['overall']['yes_count'] / stats['overall']['total_frames'] stats['overall']['no_ratio'] = stats['overall']['no_count'] / stats['overall']['total_frames'] # 按scene_id、cam_tagtimestamp降序排序 labeled_data.sort(key=lambda x: ( x.get('sence_id', ''), x.get('cam_tag', ''), -x.get('timestamp', 0) )) # 确保输出目录存在 os.makedirs(os.path.dirname(output_file) or '.', exist_ok=True) os.makedirs(os.path.dirname(stats_file) or '.', exist_ok=True) # 写入输出文件 print(f"写入处理后的数据到: {output_file}") with open(output_file, 'w', encoding='utf-8') as f: for record in labeled_data: f.write(json.dumps(record, ensure_ascii=False) + '\n') # 写入统计文件 print(f"写入统计信息到: {stats_file}") with open(stats_file, 'w', encoding='utf-8') as f: json.dump(stats, f, ensure_ascii=False, indent=4) return stats if __name__ == "__main__": # 使用指定的文件路径调用处理函数 stats = process_collision_data( input_file="./basicData_historyDepth_v2_img106k_250723__huanshi_6ver_4fish_large_model_900_792874_10frames.jsonl", output_file="processed_106k_0723_tag02.jsonl", stats_file="collision_stats_0723_tag02.json", threshold=0.7, debug=False # 设置为True可查看详细处理日志 ) # 打印摘要统计 print("\n处理完成!") print(f"总场景数: {stats['total_scenes']}") print(f"总帧数: {stats['overall']['total_frames']}") print(f"YES标签数: {stats['overall']['yes_count']}") print(f"NO标签数: {stats['overall']['no_count']}") print(f"YES占比: {stats['overall']['yes_ratio']:.2%}") print(f"NO占比: {stats['overall']['no_ratio']:.2%}") print(f"输出文件: {stats['output_file']}") print(f"统计文件: {stats['stats_file']}") 以该脚本 为基础,上述生成脚本为样本,优化此脚本
07-30
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值