HolisticTraceAnalysis 项目教程-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00449/article/details/142810445

HolisticTraceAnalysis 项目教程

HolisticTraceAnalysis A library to analyze PyTorch traces. 项目地址: https://gitcode.com/gh_mirrors/ho/HolisticTraceAnalysis

1. 项目介绍

HolisticTraceAnalysis（HTA）是一个用于分析 PyTorch 分布式训练工作负载性能瓶颈的工具。通过分析从 PyTorch Profiler（也称为 Kineto）收集的跟踪数据，HTA 能够识别和定位性能瓶颈。HTA 提供了多种功能，包括时间分解、内核分解、空闲时间分解、通信与计算重叠分析、频繁 CUDA 内核模式识别等。

2. 项目快速启动

2.1 安装

首先，确保你的系统满足以下要求：

Linux 或 macOS
Python >= 3.8

2.1.1 使用 PyPI 安装（稳定版）

pip install HolisticTraceAnalysis

2.1.2 从源码安装

git clone https://github.com/facebookresearch/HolisticTraceAnalysis.git
cd HolisticTraceAnalysis
git submodule update --init
pip install -r requirements.txt
pip install -e .

2.2 使用示例

以下是一个简单的使用示例，展示如何使用 HTA 分析跟踪数据。

from hta.trace_analysis import TraceAnalysis

# 创建 TraceAnalysis 对象
analyzer = TraceAnalysis(trace_dir="/path/to/folder/containing/the/traces")

# 获取时间分解
temporal_breakdown_df = analyzer.get_temporal_breakdown()

# 获取内核分解
kernel_breakdown_df = analyzer.get_gpu_kernel_breakdown()

# 获取空闲时间分解
idle_time_df = analyzer.get_idle_time_breakdown()

# 获取通信与计算重叠
comm_comp_overlap_df = analyzer.get_comm_comp_overlap()

# 获取频繁 CUDA 内核模式
frequent_patterns_df = analyzer.get_frequent_cuda_kernel_patterns(operator_name="aten::linear", output_dir="/new/trace/path")

# 获取 CUDA 内核启动统计
cuda_launch_kernel_stats = analyzer.get_cuda_kernel_launch_stats()

# 获取内存带宽时间序列
memory_bw_series = analyzer.get_memory_bw_time_series()

# 获取内存带宽摘要
memory_bw_summary = analyzer.get_memory_bw_summary()

# 获取队列长度时间序列
ql_series = analyzer.get_queue_length_time_series()

# 获取队列长度摘要
ql_summary = analyzer.get_queue_length_summary()