Monitoring
支持多种后端:Tensorboard、WandB、Comet、CSV文件;
TensorBoard例子:
自动监控:DeepSpeed自动把重要metric记录下来。只需在配置文件里enable相应的看板后端即可:
{ "tensorboard": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } "wandb": { "enabled": true, "team": "my_team", "group": "my_group", "project": "my_project" } "comet": { "enabled": true, "project": "my_project", "experiment_name": "my_experiment" } "csv_monitor": { "enabled": true, "output_path": "output/ds_logs/", "job_name": "train_bert" } }
自定义监控:
# Step 1: Import monitor (and DeepSpeed config, if needed)
from deepspeed.monitor.monitor import MonitorMaster
from deepspeed.runtime.config import DeepSpeedConfig# Step 2: Initialized
DeepSpeed Monitoring & Comm. Logging
于 2024-06-12 07:16:55 首次发布