如何知道一个训练任务中rank、stream、task的关系

问题描述:

如果得到一个训练任务有哪些rank,然后一个rank有哪些stream,一个stream有哪些task?

同时如何知道哪两个task有通信呢?

是否可以通过日志来获取这些信息呢?

解决方法:

如果说的是mindspore框架在昇腾环境下训练的话,是可以通过昇腾的日志观察到部分task相关的信息,相关的日志目录通常在~/ascend目录下面:

46.png

上述日志是只要运行就会自动保存的,该目录下还有很多其它日志文件,可以用命令直接搜索一下相关的关键词信息;你也可以把mindspore框架的日志级别环境变量设置为info级别,然后运行中会直接打印出很多框架层面的信息,包括峰值device memory之类的信息,可以找找里面或许也有task相关的信息

还有在mindspore的性能分析中,也可以收集通信相关的数据,里面或许有你需要的信息,性能分析相关的操作可参考文档:

https://www.mindspore.cn/mindinsight/docs/zh-CN/master/performance_profiling_ascend.html

47.png

# Ultralytics YOLO 🚀, AGPL-3.0 license # Default training settings and hyperparameters for medium-augmentation COCO training #中等增强的COCO训练的默认训练设置和超参数 task: detect # (str) YOLO task, i.e. detect, segment, classify, pose mode: train # (str) YOLO mode, i.e. train, val, predict, export, track, benchmark # Train settings ------------------------------------------------------------------------------------------------------- model: # (str, optional) path to model file, i.e. yolov8n.pt, yolov8n.yaml data: # (str, optional) path to data file, i.e. coco128.yaml epochs: 100 # (int) number of epochs to train for time: # (float, optional) number of hours to train for, overrides epochs if supplied patience: 20 # (int) epochs to wait for no observable improvement for early stopping of training batch: 20 # (int) number of images per batch (-1 for AutoBatch)!!!!!!! imgsz: 640 # (int | list) input images size as int for train and val modes, or list[w,h] for predict and export modes save: True # (bool) save train checkpoints and predict results save_period: -1 # (int) Save checkpoint every x epochs (disabled if < 1) cache: False # (bool) True/ram, disk or False. Use cache for data loading device: 0 # (int | str | list, optional) device to run on, i.e. cuda device=0 or device=0,1,2,3 or device=cpu workers: 8 # (int) number of worker threads for data loading (per RANK if DDP)!!!!!!! project: # (str, optional) project name name: # (str, optional) experiment name, results saved to 'project/name' directory exist_ok: False # (bool) whether to overwrite existing experiment pretrained: True # (bool | str) whether to use a pretrained model (bool) or a model to load weights from (str) optimizer: auto # (str) optimizer to use, choices=[SGD, Adam, Adamax, AdamW, NAdam, RAdam, RMSProp, auto] verbose: True # (bool) whether to print verbose output seed: 0 # (int) random seed for reproducibility deterministic: True # (bool) whether to enable deterministic mode single_cls: False # (bool) train multi-class data as single-class rect: False # (bool) rectangular training if mode='train' or rectangular validation if mode='val' cos_lr: False # (bool) use cosine learning rate scheduler close_mosaic: 10 # (int) disable mosaic augmentation for final epochs (0 to disable) resume: False # (bool) resume training from last checkpoint amp: True # (bool) Automatic Mixed Precision (AMP) training, choices=[True, False], True runs AMP check fraction: 1.0 # (float) dataset fraction to train on (default is 1.0, all images in train set) profile: False # (bool) profile ONNX and TensorRT speeds during training for loggers freeze: None # (int | list, optional) freeze first n layers, or freeze list of layer indices during training # Segmentation overlap_mask: True # (bool) masks should overlap during training (segment train only) mask_ratio: 2 # (int) mask downsample ratio (segment train only) # Classification dropout: 0.0 # (float) use dropout regularization (classify train only) # Val/Test settings ---------------------------------------------------------------------------------------------------- val: True # (bool) validate/test during training split: val # (str) dataset split to use for validation, i.e. 'val', 'test' or 'train' save_json: False # (bool) save results to JSON file save_hybrid: False # (bool) save hybrid version of labels (labels + additional predictions) conf: 0.3 # 置信度(float, optional) object confidence threshold for detection (default 0.25 predict, 0.001 val) iou: 0.8 # (float) intersection over union (IoU) threshold for NMS max_det: 300 # (int) maximum number of detections per image half: False # (bool) use half precision (FP16) dnn: False # (bool) use OpenCV DNN for ONNX inference plots: True # (bool) save plots and images during train/val # Predict settings ----------------------------------------------------------------------------------------------------- source: # (str, optional) source directory for images or videos vid_stride: 1 # (int) video frame-rate stride stream_buffer: False # (bool) buffer all streaming frames (True) or return the most recent frame (False) visualize: False # (bool) visualize model features augment: False # (bool) apply image augmentation to prediction sources agnostic_nms: False # (bool) class-agnostic NMS classes: # (int | list[int], optional) filter results by class, i.e. classes=0, or classes=[0,2,3] retina_masks: True # (bool) use high-resolution segmentation masks # Visualize settings --------------------------------------------------------------------------------------------------- show: False # (bool) show predicted images and videos if environment allows save_frames: False # (bool) save predicted individual video frames save_txt: False # (bool) save results as .txt file save_conf: False # (bool) save results with confidence scores save_crop: False # (bool) save cropped images with results show_labels: True # (bool) show prediction labels, i.e. 'person' show_conf: True # (bool) show prediction confidence, i.e. '0.99' show_boxes: True # (bool) show prediction boxes line_width: # (int, optional) line width of the bounding boxes. Scaled to image size if None. # Export settings ------------------------------------------------------------------------------------------------------ format: torchscript # (str) format to export to, choices at https://docs.ultralytics.com/modes/export/#export-formats keras: False # (bool) use Kera=s optimize: False # (bool) TorchScript: optimize for mobile int8: False # (bool) CoreML/TF INT8 quantization dynamic: False # (bool) ONNX/TF/TensorRT: dynamic axes simplify: False # (bool) ONNX: simplify model opset: # (int, optional) ONNX: opset version workspace: 4 # (int) TensorRT: workspace size (GB) nms: False # (bool) CoreML: add NMS # Hyperparameters ------------------------------------------------------------------------------------------------------ lr0: 0.01 # (float) initial learning rate (i.e. SGD=1E-2, Adam=1E-3) lrf: 0.01 # (float) final learning rate (lr0 * lrf) momentum: 0.937 # (float) SGD momentum/Adam beta1 weight_decay: 0.0005 # (float) optimizer weight decay 5e-4 warmup_epochs: 3.0 # (float) warmup epochs (fractions ok) warmup_momentum: 0.8 # (float) warmup initial momentum warmup_bias_lr: 0.1 # (float) warmup initial bias lr box: 7.5 # (float) box loss gain cls: 0.5 # (float) cls loss gain (scale with pixels) dfl: 1.5 # (float) dfl loss gain pose: 12.0 # (float) pose loss gain kobj: 1.0 # (float) keypoint obj loss gain label_smoothing: 0.0 # (float) label smoothing (fraction) nbs: 64 # (int) nominal batch size hsv_h: 0 # (float) image HSV-Hue augmentation (fraction) 0.015 hsv_s: 0 # (float) image HSV-Saturation augmentation (fraction)0.7 hsv_v: 0 # (float) image HSV-Value augmentation (fraction)0.4 degrees: 0.0 # (float) image rotation (+/- deg) translate: 0.1 # (float) image translation (+/- fraction) scale: 0.5 # (float) image scale (+/- gain) shear: 0.0 # (float) image shear (+/- deg) perspective: 0.0 # (float) image perspective (+/- fraction), range 0-0.001 flipud: 0 # (float) image flip up-down (probability)0.5 fliplr: 0 # (float) image flip left-right (probability)0.5 mosaic: 1 # (float) image mosaic (probability)1.0 mixup: 0.0 # (float) image mixup (probability) copy_paste: 0.0 # (float) segment copy-paste (probability) # Custom config.yaml --------------------------------------------------------------------------------------------------- cfg: # (str, optional) for overriding defaults.yaml # Tracker settings ------------------------------------------------------------------------------------------------------ tracker: botsort.yaml # (str) tracker type, choices=[botsort.yaml, bytetrack.yaml] 帮我介绍一下这些参数
07-23
### Ultralytics YOLO 配置文件参数详解 Ultralytics YOLO 的配置文件(如 `yolov8s-seg.yaml`)用于定义模型的结构、训练参数和数据处理方式。理解这些参数的含义及其用途对于优化模型性能至关重要。 #### 模型结构参数 配置文件中的模型结构参数定义了网络的骨干(backbone)和检测头(head)的结构。这些参数直接影响模型的特征提取能力和推理速度。 - **backbone**:定义了模型的骨干网络,通常使用CSPDarknet结构。骨干网络负责从输入图像中提取多尺度特征。 - **head**:定义了检测头的结构,通常包含PANet(Path Aggregation Network)模块。检测头负责生成边界框、类别预测和掩膜预测。 - **depth_multiple**:控制网络深度的倍数。例如,`depth_multiple=0.33` 表示将标准网络的深度缩小为原来的三分之一。 - **width_multiple**:控制网络宽度的倍数。例如,`width_multiple=0.5` 表示将标准网络的宽度缩小为原来的一半。 #### 数据处理参数 数据处理参数定义了输入图像的预处理方式和数据增强策略。这些参数对模型的泛化能力和鲁棒性有重要影响。 - **imgsz**:输入图像的尺寸。例如,`imgsz=640` 表示输入图像的大小为640×640像素。 - **augment**:是否启用数据增强。数据增强可以提高模型的泛化能力。 - **hsv_h**、**hsv_s**、**hsv_v**:定义HSV颜色空间的随机增强范围。这些参数用于调整图像的色调、饱和度和亮度。 - **degrees**、**translate**、**scale**、**shear**:定义图像的旋转、平移、缩放和剪切增强范围。 - **flipud**、**fliplr**:定义图像的上下翻转和左右翻转概率。 #### 训练参数 训练参数定义了模型的训练过程,包括优化器、学习率和损失函数。 - **optimizer**:指定优化器的类型,如SGD或Adam。 - **lr0**:初始学习率。 - **lrf**:学习率的最终值,通常用于学习率衰减。 - **momentum**:动量参数,用于优化器。 - **weight_decay**:权重衰减系数,用于防止过拟合。 - **warmup_epochs**:学习率预热的轮数。 - **cos_lr**:是否使用余弦学习率衰减策略。 - **loss**:指定损失函数的类型,如CIoU损失或DIoU损失。 #### 推理参数 推理参数定义了模型在推理阶段的行为,包括置信度阈值和非极大值抑制(NMS)阈值。 - **conf**:置信度阈值,用于过滤低置信度的预测框。 - **iou**:NMS的IoU阈值,用于去除重叠的预测框。 #### 配置文件示例 以下是一个典型的YOLOv8配置文件示例: ```yaml # YOLOv8 configuration for segmentation tasks # Model parameters backbone: name: CSPDarknet depth_multiple: 0.33 width_multiple: 0.5 out_indices: [2, 3, 4] head: name: SegmentationHead in_channels: [128, 256, 512] num_classes: 80 mask_resolution: 160 # Data parameters imgsz: 640 augment: True hsv_h: 0.015 hsv_s: 0.7 hsv_v: 0.4 degrees: 0.0 translate: 0.1 scale: 0.5 shear: 0.0 flipud: 0.5 fliplr: 0.5 # Training parameters optimizer: SGD lr0: 0.01 lrf: 0.01 momentum: 0.937 weight_decay: 0.0005 warmup_epochs: 3 cos_lr: True loss: CIoULoss # Inference parameters conf: 0.25 iou: 0.45 ``` #### 相关问题 1. 如何在YOLOv8中调整数据增强策略? 2. YOLOv8的配置文件支持自定义模型结构吗? 3. 如何在YOLOv8中设置不同的损失函数?
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值