How to profile TensorFlow

本文详细介绍如何使用TensorFlow的timeline模块进行程序性能剖析,包括单个run的剖析、多个run的合并以及常见问题的解决方案。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

TensorFlow是当前最常用的机器学习库之一。对TensorFlow Graph进行profile并了解各个op的耗时情况对于程序性能的提升非常有用。TensorFlow程序的profile可以通过tensorflow的timeline模块完成。但网上找不到任何明确的教程。所以在这篇博文中,我将尝试按以下主题来介绍tensorflow程序的profile:

简单示例

我们首先用一个非常简单的例子来感受下timeline模块的魅力。该例子来自StackOverflow answer

# https://github.com/ikhlestov/tensorflow_profiling/01_simple_example.py of Github

import tensorflow as tf
from tensorflow.python.client import timeline

a = tf.random_normal([200, 500])
b = tf.random_normal([500, 100])
res = tf.matmul(a, b)

with tf.Session() as sess:
    # add additional options to trace the session execution
    options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
    run_metadata = tf.RunMetadata()
    sess.run(res, options=options, run_metadata=run_metadata)

    # Create the Timeline object, and write it to a json file
    fetched_timeline = timeline.Timeline(run_metadata.step_stats)
    chrome_trace = fetched_timeline.generate_chrome_trace_format()
    with open('timeline_01.json', 'w') as f:
        f.write(chrome_trace)

你应该注意到了,我们为session.run指定了optionsrun_metadata参数。在执行后,会产生一个 timeline_01.json 文件(以Chrome trace格式)。如果该脚本在你的机器上无法运行,可尝试profile中可能出现的问题及解决方案部分的第一个解决方案。

timeline存储数据的格式是Chrome tracing格式,因此只能使用Chrome浏览器来查看存储的数据。打开Chrome浏览器,在地址栏输入 chrome://tracing,在左上部,你将看到一个Load按钮,通过其来加载我们前面产生的JSON文件。
在这里插入图片描述
在上图的顶部,你将看到以ms为单位的时间轴。想要查看关于op的更多信息,点击对应时间片段即可。另外页面右边有一些简单的工具:selection、pan、zoom、timing。

复杂点的示例

现在,让我们来研究一下稍微复杂点的例子:

# https://github.com/ikhlestov/tensorflow_profiling/02_example_with_placeholders_and_for_loop.py of Github

import os
import tempfile

import tensorflow as tf
from tensorflow.contrib.layers import fully_connected as fc
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python.client import timeline

batch_size = 100

# placeholder
inputs = tf.placeholder(tf.float32, [batch_size, 784])
targets = tf.placeholder(tf.float32, [batch_size, 10])

# model
fc_1_out = tf.layers.dense(inputs, 500, activation=tf.nn.sigmoid)
fc_2_out = tf.layers.dense(fc_1_out, 784, activation=tf.nn.sigmoid)
logits = tf.layers.dense(fc_2_out, 10, activation=None)

# loss
loss = tf.losses.softmax_cross_entropy(onehot_labels=targets, logits=logits)
# train_op
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)


if __name__ == '__main__':
    mnist_save_dir = os.path.join(tempfile.gettempdir(), 'MNIST_data')
    mnist = input_data.read_data_sets(mnist_save_dir, one_hot=True)

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())

        run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
        run_metadata = tf.RunMetadata()
        for i in range(3):
            batch_input, batch_target = mnist.train.next_batch(batch_size)
            feed_dict = {inputs: batch_input,
                         targets: batch_target}

            sess.run(train_op,
                     feed_dict=feed_dict,
                     options=run_options,
                     run_metadata=run_metadata)

            fetched_timeline = timeline.Timeline(run_metadata.step_stats)
            chrome_trace = fetched_timeline.generate_chrome_trace_format()
            with open('timeline_02_step_%d.json' % i, 'w') as f: # 将每个run的trace单独存为一个json文件
                f.write(chrome_trace)

在上面的例子中,我们使用tf.variable_scope对变量进行了管理。通过这个操作,timeline的显示将更加清晰明了。

另外,我们的代码以三个文件的形式保存了3个run的trace。如果我们在CPU执行这段程序,我们将得到下图所示的timelines(根据电脑不同,可能略有差别):
在这里插入图片描述
如果在GPU运行该段代码,你将发现第1个run的trace结果与后面不同。
在这里插入图片描述
在这里插入图片描述
你可能会注意到,在GPU上的首次运行比后续运行耗时长的多。发生这种情况是因为在第一次运行时,tensorflow执行一些GPU初始化,之后它们将被优化。如果您想要更准确的时间线,则应在100次左右的运行后存储trace。

此外,现在所有的 incoming/outcoming 流都以variable scope名称开始,并且我们确切地知道源代码中各个op的位置。

将多个run的timeline存储在一个文件

由于某些原因,我们想将多个session run的timeline存储到一个文件中,应该怎么做呢?

很不幸,这只能手动来完成。在Chrome trace格式中,有每一个事件的定义和其的运行时间。在第一个run中,我们保存了所有的数据,但是在后续的run中,我们将只更新运行时间。

# https://github.com/ikhlestov/tensorflow_profiling/blob/master/03_merged_timeline_example.py

import os
import tempfile
import json

import tensorflow as tf
from tensorflow.contrib.layers import fully_connected as fc
from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.python.client import timeline


class TimeLiner:
    _timeline_dict = None

    def update_timeline(self, chrome_trace):
        # convert crome trace to python dict
        chrome_trace_dict = json.loads(chrome_trace)
        # for first run store full trace
        if self._timeline_dict is None:
            self._timeline_dict = chrome_trace_dict
        # for other - update only time consumption, not definitions
        else:
            for event in chrome_trace_dict['traceEvents']:
                # events time consumption started with 'ts' prefix
                if 'ts' in event:
                    self._timeline_dict['traceEvents'].append(event)

    def save(self, f_name):
        with open(f_name, 'w') as f:
            json.dump(self._timeline_dict, f)


batch_size = 100

# placeholder
inputs = tf.placeholder(tf.float32, [batch_size, 784])
targets = tf.placeholder(tf.float32, [batch_size, 10])

# model
fc_1_out = tf.layers.dense(inputs, 500, activation=tf.nn.sigmoid)
fc_2_out = tf.layers.dense(fc_1_out, 784, activation=tf.nn.sigmoid)
logits = tf.layers.dense(fc_2_out, 10, activation=None)

# loss
loss = tf.losses.softmax_cross_entropy(onehot_labels=targets, logits=logits)
# train_op
train_op = tf.train.GradientDescentOptimizer(0.01).minimize(loss)

if __name__ == '__main__':
    mnist_save_dir = os.path.join(tempfile.gettempdir(), 'MNIST_data')
    mnist = input_data.read_data_sets(mnist_save_dir, one_hot=True)

    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    with tf.Session(config=config) as sess:
        sess.run(tf.global_variables_initializer())

        options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
        run_metadata = tf.RunMetadata()
        many_runs_timeline = TimeLiner()
        runs = 5
        for i in range(runs):
            batch_input, batch_target = mnist.train.next_batch(batch_size)
            feed_dict = {inputs: batch_input,
                         targets: batch_target}

            sess.run(train_op,
                     feed_dict=feed_dict,
                     options=options,
                     run_metadata=run_metadata)

            fetched_timeline = timeline.Timeline(run_metadata.step_stats)
            chrome_trace = fetched_timeline.generate_chrome_trace_format()
            many_runs_timeline.update_timeline(chrome_trace)
        many_runs_timeline.save('timeline_03_merged_%d_runs.json' % runs)

然后我们就得到了合并后的timeline:
在这里插入图片描述

在profile中可能出现的问题及解决方案

profile期间可能出现一些问题。首先,它可能不工作。如果你遇到了这下面的错误:

I tensorflow/stream_executor/dso_loader.cc:126] Couldn't open CUDA library libcupti.so.8.0. LD_LIBRARY_PATH:

你可以安装 libcupti-dev 来解决该问题:

sudo apt-get install libcupti-dev

第二个常见的错误是运行延迟。在最后一张图中,我们可以看到在run之间有一个gap。对于大网络,这个gap可能会很长。这个bug不能完全解决,但使用custom C++ protobuf 库可以减少延迟。这在TF的官方文档中有叙述。

两个run之间的gap的形成原因:由于我们以Python代码的串行方式来实现每个step的timeline的保存,因此这个gap不可避免,如果直接用TensorFlow后端C++ engine以并行的方式保存每个step的timeline,将彻底消除这个gap。

结论

希望通过上面的分析,你对TensorFlow的profile过程应该已经掌握。本文使用的所有代码都可以在这个repo中找到。

文档来源:
本文翻译自 https://towardsdatascience.com/howto-profile-tensorflow-1a49fb18073d
上述文档的其他翻译有:https://walsvid.github.io/2017/03/25/profiletensorflow/

WARNING:tensorflow:From /root/Python/conda_lit/kind/lib/python3.6/site-packages/tensorflow/python/compat/v2_compat.py:96: disable_resource_variables (from tensorflow.python.ops.variable_scope) is deprecated and will be removed in a future version. Instructions for updating: non-resource variables are not supported in the long term 2025-07-26 19:44:10.231806: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2025-07-26 19:44:10.313436: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: pciBusID: 0000:39:00.0 name: NVIDIA GeForce RTX 4090 computeCapability: 8.9 coreClock: 2.52GHz coreCount: 128 deviceMemorySize: 23.55GiB deviceMemoryBandwidth: 938.86GiB/s 2025-07-26 19:44:10.313630: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory 2025-07-26 19:44:10.313667: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcublas.so.10'; dlerror: libcublas.so.10: cannot open shared object file: No such file or directory 2025-07-26 19:44:10.314718: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10 2025-07-26 19:44:10.314970: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10 2025-07-26 19:44:10.315017: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusolver.so.10'; dlerror: libcusolver.so.10: cannot open shared object file: No such file or directory 2025-07-26 19:44:10.315049: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcusparse.so.10'; dlerror: libcusparse.so.10: cannot open shared object file: No such file or directory 2025-07-26 19:44:10.315080: W tensorflow/stream_executor/platform/default/dso_loader.cc:55] Could not load dynamic library 'libcudnn.so.7'; dlerror: libcudnn.so.7: cannot open shared object file: No such file or directory 2025-07-26 19:44:10.315086: W tensorflow/core/common_runtime/gpu/gpu_device.cc:1598] Cannot dlopen some GPU libraries. Please make sure the missing libraries mentioned above are installed properly if you would like to use GPU. Follow the guide at https://www.tensorflow.org/install/gpu for how to download and setup the required libraries for your platform. Skipping registering GPU devices... 2025-07-26 19:44:10.315433: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2025-07-26 19:44:10.342984: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2000000000 Hz 2025-07-26 19:44:10.349020: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f67d0000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices: 2025-07-26 19:44:10.349043: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version 2025-07-26 19:44:10.350861: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix: 2025-07-26 19:44:10.350875: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] WARNING:tensorflow:From /root/Python/conda_lit/kind/lib/python3.6/site-packages/tf_slim/layers/layers.py:1089: Layer.apply (from tensorflow.python.keras.engine.base_layer_v1) is deprecated and will be removed in a future version. Instructions for updating: Please use `layer.__call__` method instead. loaded ./checkpoint/decom_net_train/model.ckpt loaded ./checkpoint/illumination_adjust_net_train/model.ckpt No restoration pre model! (480, 640, 3) (680, 720, 3) (415, 370, 3) Traceback (most recent call last): File "evaluate.py", line 73, in <module> eval_low_im = load_images(eval_low_data_name[idx]) File "/root/Python/KinD-master/KinD-master/utils.py", line 63, in load_images im = Image.open(file) File "/root/Python/conda_lit/kind/lib/python3.6/site-packages/PIL/Image.py", line 2975, in open fp = builtins.open(filename, "rb") IsADirectoryError: [Errno 21] Is a directory: './test/results'
最新发布
07-27
<think>根据错误日志,问题在于尝试打开一个目录('./test/results')作为图像文件。错误发生在`evaluate.py`的第73行,调用`load_images`函数时传入了一个目录路径,而该函数期望一个图像文件路径。 解决方案: 1. 修改代码,确保传递给`load_images`函数的是一个图像文件路径,而不是目录。 2. 在加载图像之前,检查路径是文件还是目录。如果是目录,则需要遍历目录中的所有图像文件,并逐个处理。 具体步骤: 在`evaluate.py`中,构建图像路径列表时,如果遇到目录,应该递归地获取目录下所有支持的图像文件。 修改`evaluate.py`中构建图像列表的部分,使用一个函数来收集所有图像文件路径。 以下是对`evaluate.py`的修改建议: 首先,我们可以在`utils.py`中添加一个函数,用于递归获取目录中的所有图像文件路径(如果之前没有的话):</think>从错误日志来看,您面临两个核心问题: 1. 图像加载路径错误:尝试加载目录而不是图像文件 2. TensorFlow GPU 环境配置问题:缺少必要的 CUDA 库文件 ### 完整解决方案 #### 问题1:图像加载路径错误 (IsADirectoryError) ```python # utils.py (完整修复版) import os import numpy as np from PIL import Image import tensorflow as tf # 支持的图像格式 SUPPORTED_EXTENSIONS = ['.png', '.jpg', '.jpeg', '.bmp', '.tif', '.tiff', '.webp'] def is_image_file(file_path): """检查文件是否是支持的图像类型""" return any(file_path.lower().endswith(ext) for ext in SUPPORTED_EXTENSIONS) def load_image_safe(file_path): """ 安全加载图像文件,自动处理各种错误情况 返回: (image_array, error_message) 元组 """ try: # 检查路径类型 if os.path.isdir(file_path): return None, f"Path is a directory: {file_path}" # 检查文件存在性 if not os.path.exists(file_path): return None, f"File not found: {file_path}" # 检查文件格式 if not is_image_file(file_path): return None, f"Unsupported image format: {file_path}" # 尝试打开图像 img = Image.open(file_path) # 转换为RGB格式 if img.mode != 'RGB': img = img.convert('RGB') # 转换为numpy数组并归一化 img_array = np.array(img, dtype="float32") / 255. return img_array, None except Exception as e: return None, f"Error loading image: {str(e)}" def find_image_files(input_path): """ 智能查找输入路径中的所有图像文件 支持: 单个文件、目录、包含子目录的目录结构 """ # 如果是单个图像文件 if os.path.isfile(input_path) and is_image_file(input_path): return [input_path] # 如果是目录 if os.path.isdir(input_path): image_files = [] for root, _, files in os.walk(input_path): for file in files: if is_image_file(file): full_path = os.path.join(root, file) image_files.append(full_path) return image_files # 其他情况 return [] def preprocess_image(image_array, max_size=1024): """预处理图像:调整大小、添加批次维度等""" # 确保是3通道图像 if image_array.ndim == 2: # 灰度图 image_array = np.stack([image_array]*3, axis=-1) elif image_array.shape[2] == 4: # RGBA转RGB image_array = image_array[..., :3] # 调整过大图像 h, w = image_array.shape[:2] if max(h, w) > max_size: scale = max_size / max(h, w) new_h, new_w = int(h * scale), int(w * scale) image_array = tf.image.resize(image_array, [new_h, new_w]) # 添加批次维度 return np.expand_dims(image_array, axis=0) ``` ```python # evaluate.py (修改后的主程序) import os import time import argparse from tqdm import tqdm import numpy as np import tensorflow as tf from utils import find_image_files, load_image_safe, preprocess_image # 配置GPU内存增长,避免OOM错误 gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: try: for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) except RuntimeError as e: print(e) def main(): # 解析命令行参数 parser = argparse.ArgumentParser(description='Image Enhancement Evaluation') parser.add_argument('--input', type=str, default='./test/results', help='Input path (file or directory)') parser.add_argument('--output', type=str, default='./enhanced_results', help='Output directory for enhanced images') parser.add_argument('--max_size', type=int, default=1024, help='Maximum image dimension for resizing') args = parser.parse_args() # 确保输出目录存在 os.makedirs(args.output, exist_ok=True) # 查找所有图像文件 image_paths = find_image_files(args.input) if not image_paths: print(f"No images found in: {args.input}") return print(f"Found {len(image_paths)} images to process") # 加载模型 print("Loading models...") try: # 这里替换为您的模型加载代码 decom_model = tf.saved_model.load('./checkpoint/decom_net_train') illum_model = tf.saved_model.load('./checkpoint/illumination_adjust_net_train') print("Models loaded successfully") except Exception as e: print(f"Error loading models: {str(e)}") return # 处理每个图像 processed_count = 0 start_time = time.time() for img_path in tqdm(image_paths, desc="Processing images"): try: # 加载图像 image_array, error = load_image_safe(img_path) if error: print(f"Skipping {os.path.basename(img_path)}: {error}") continue # 预处理图像 preprocessed = preprocess_image(image_array, args.max_size) # 应用模型 # 这里替换为您的模型处理代码 # enhanced = model(preprocessed) # 保存结果 output_path = os.path.join(args.output, os.path.basename(img_path)) # tf.keras.preprocessing.image.save_img(output_path, enhanced[0]) processed_count += 1 except Exception as e: print(f"Error processing {os.path.basename(img_path)}: {str(e)}") # 打印统计信息 elapsed = time.time() - start_time print(f"\nProcessed {processed_count}/{len(image_paths)} images in {elapsed:.2f} seconds") if processed_count > 0: print(f"Average time per image: {elapsed/processed_count:.2f} seconds") if __name__ == "__main__": main() ``` #### 问题2:TensorFlow GPU 环境配置问题 从错误日志看,缺少以下关键库文件: ``` libcudart.so.10.1 libcublas.so.10 libcusolver.so.10 libcusparse.so.10 libcudnn.so.7 ``` 解决方案: 1. **验证 CUDA 版本兼容性** ```bash nvcc --version ``` 2. **安装匹配的 CUDA 工具包** ```bash # 对于 TensorFlow 2.x wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600 sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/7fa2af80.pub sudo add-apt-repository "deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/ /" sudo apt-get update sudo apt-get -y install cuda-11.2 ``` 3. **安装 cuDNN** ```bash # 从 NVIDIA 官网下载匹配版本 # https://developer.nvidia.com/rdp/cudnn-download tar -xzvf cudnn-11.2-linux-x64-v8.1.1.33.tgz sudo cp cuda/include/cudnn*.h /usr/local/cuda/include sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64 sudo chmod a+r /usr/local/cuda/include/cudnn*.h /usr/local/cuda/lib64/libcudnn* ``` 4. **设置环境变量** 在 `~/.bashrc` 或 `~/.zshrc` 中添加: ```bash export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH export PATH=/usr/local/cuda/bin:$PATH ``` 然后执行: ```bash source ~/.bashrc ``` 5. **验证 GPU 可用性** ```python # gpu_test.py import tensorflow as tf print("TensorFlow Version:", tf.__version__) print("GPU Devices:", tf.config.list_physical_devices('GPU')) ``` ### 关键改进说明 1. **健壮的图像加载**: - 自动区分文件和目录 - 支持递归目录搜索 - 全面的错误处理 - 自动格式转换(灰度/RGBA → RGB) 2. **智能图像预处理**: ```python def preprocess_image(image_array, max_size=1024): # 确保3通道 # 调整过大图像 # 添加批次维度 ``` 3. **命令行参数支持**: ```python parser.add_argument('--input', type=str, default='./test/results') parser.add_argument('--output', type=str, default='./enhanced_results') parser.add_argument('--max_size', type=int, default=1024) ``` 4. **进度显示与性能监控**: ```python for img_path in tqdm(image_paths, desc="Processing images"): # ... # 结束后打印统计信息 elapsed = time.time() - start_time print(f"\nProcessed {processed_count}/{len(image_paths)} images in {elapsed:.2f} seconds") ``` 5. **GPU 内存优化**: ```python gpus = tf.config.experimental.list_physical_devices('GPU') if gpus: try: for gpu in gpus: tf.config.experimental.set_memory_growth(gpu, True) except RuntimeError as e: print(e) ``` ### 使用说明 1. **安装依赖**: ```bash pip install tensorflow pillow tqdm numpy ``` 2. **修复 CUDA 环境**: 按照上述 CUDA 安装步骤配置环境 3. **运行程序**: ```bash python evaluate.py --input ./test/results --output ./enhanced_results ``` 4. **处理选项**: - 处理单个文件:`--input image.jpg` - 处理整个目录:`--input ./images/` - 调整最大尺寸:`--max_size 2048` 这个解决方案不仅修复了目录路径错误,还解决了 GPU 环境问题,并提供了更健壮、更易用的图像处理流程。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值