Debug: tensorboard不显示profile

本文详细介绍了如何使用TensorBoard进行TensorFlow模型性能分析,包括设置log_dir和profile_batch,以及解决profile选项缺失的问题,并着重讲解了如何通过pip安装profile插件以获取更深入的性能指标。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

问题:

  • tensorflow性能分析
  • 采用Tensorboard方式
TensorBoard(log_dir='../../docs/tensorboard/logs', histogram_freq=1, profile_batch=2)

但是,最上面选项卡里没有profile
在这里插入图片描述

注意,在显示的电脑里安装一下:

pip install -U tensorboard_plugin_profile
"""Benchmark the latency of processing a single batch of requests.""" import argparse import time from pathlib import Path from typing import Optional import numpy as np import torch from tqdm import tqdm from vllm import LLM, SamplingParams def main(args: argparse.Namespace): print(args) # NOTE(woosuk): If the request cannot be processed in a single batch, # the engine will automatically process the request in multiple batches. llm = LLM( model=args.model, tokenizer=args.tokenizer, quantization=args.quantization, tensor_parallel_size=args.tensor_parallel_size, trust_remote_code=args.trust_remote_code, dtype=args.dtype, block_size=args.block_size, gpu_memory_utilization=args.gpu_memory_utilization, enforce_eager=True ) sampling_params = SamplingParams( n=args.n, temperature=1.0 if args.sample else 0.0, top_p=1.0, use_beam_search=False, ignore_eos=True, max_tokens=args.output_len, ) print(sampling_params) dummy_prompt_token_ids = [[0] * args.input_len] * args.batch_size def run_to_completion(profile_dir: Optional[str] = None): if profile_dir: with torch.profiler.profile( activities=[ torch.profiler.ProfilerActivity.CPU, torch.profiler.ProfilerActivity.CUDA, ], on_trace_ready=torch.profiler.tensorboard_trace_handler( str(profile_dir))) as p: llm.generate(prompt_token_ids=dummy_prompt_token_ids, sampling_params=sampling_params, use_tqdm=False) print(p.key_averages()) return None else: start_time = time.perf_counter() llm.generate(prompt_token_ids=dummy_prompt_token_ids, sampling_params=sampling_params, use_tqdm=False) end_time = time.perf_counter() latency = end_time - start_time return latency print("Warming up...") run_to_completion(profile_dir=None) if args.profile: profile_dir = args.profile_result_dir if not profile_dir: profile_dir = Path(".") / "vllm_benchmark_result" / f"latency_result_{time.time()}" print(f"Profiling (results will be saved to '{profile_dir}')...") run_to_completion(profile_dir=args.profile_result_dir) return # Benchmark. latencies = [] for _ in tqdm(range(args.num_iters), desc="Profiling iterations"): latencies.append(run_to_completion(profile_dir=None)) print(f'Avg latency: {np.mean(latencies)} seconds') print(f'qps: {(args.output_len / np.mean(latencies)) * args.batch_size} tokens/s') if __name__ == '__main__': parser = argparse.ArgumentParser( description='Benchmark the latency of processing a single batch of ' 'requests till completion.') parser.add_argument('--model', type=str, default='facebook/opt-125m') parser.add_argument('--tokenizer', type=str, default=None) parser.add_argument('--quantization', '-q', choices=['awq', 'gptq', 'squeezellm', None], default=None) parser.add_argument('--tensor-parallel-size', '-tp', type=int, default=1) parser.add_argument('--input-len', type=int, default=32) parser.add_argument('--output-len', type=int, default=128) parser.add_argument('--batch-size', type=int, default=8) parser.add_argument('--block-size', type=int, default=16) parser.add_argument('--gpu_memory_utilization', type=float, default=0.9) parser.add_argument('--n', type=int, default=1, help='Number of generated sequences per prompt.') parser.add_argument('--sample', action='store_true') parser.add_argument('--num-iters', type=int, default=3, help='Number of iterations to run.') parser.add_argument('--trust-remote-code', action='store_true', help='trust remote code from huggingface') parser.add_argument( '--dtype', type=str, default='auto', choices=['auto', 'half', 'float16', 'bfloat16', 'float', 'float32'], help='data type for model weights and activations. ' 'The "auto" option will use FP16 precision ' 'for FP32 and FP16 models, and BF16 precision ' 'for BF16 models.') parser.add_argument("--worker_type", type=str, default="atb", help="atb + vllm") parser.add_argument( '--profile', action='store_true', help='profile the generation process of a single batch') parser.add_argument( '--profile-result-dir', type=str, default=None, help=( 'path to save the pytorch profiler output. Can be visualized ' 'with ui.perfetto.dev or Tensorboard.' )) arg = parser.parse_args() main(arg)
最新发布
07-29
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

YueTann

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值