[blog] PyTorch or TensorFlow?

本文详细比较了PyTorch和TensorFlow两大深度学习框架的优劣,从快速原型设计到大规模部署,涵盖设备管理、自定义扩展、数据加载等关键方面。

This is a guide to the main differences I’ve found between PyTorch and TensorFlow. This post is intended to be useful for anyone considering starting a new project or making the switch from one deep learning framework to another. The focus is on programmability and flexibility when setting up the components of the training and deployment deep learning stack. I won’t go into performance (speed / memory usage) trade-offs.

Summary

PyTorch is better for rapid prototyping in research, for hobbyists and for small scale projects. TensorFlow is better for large-scale deployments, especially when cross-platform and embedded deployment is a consideration.

Ramp-up Time

Winner: PyTorch

PyTorch is essentially a GPU enabled drop-in replacement for NumPy equipped with higher-level functionality for building and training deep neural networks. This makes PyTorch especially easy to learn if you are familiar with NumPy, Python and the usual deep learning abstractions (convolutional layers, recurrent layers, SGD, etc.).

On the other hand, a good mental model for TensorFlow is a programming language embedded within Python. When you write TensorFlow code it gets “compiled” into a graph by Python and then run by the TensorFlow execution engine. I’ve seen newcomers to TensorFlow struggle to wrap their head around this added layer of indirection. Also because of this, TensorFlow has a few extra concepts to learn such as the session, the graph, variable scoping and placeholders. Also more boilerplate code is needed to get a basic model running. The ramp-up time to get going with TensorFlow is definitely longer than PyTorch.

Graph Creation and Debugging

Winner: PyTorch

Creating and running the computation graph is perhaps where the two frameworks differ the most. In PyTorch the graph construction is dynamic, meaning the graph is built at run-time. In TensorFlow the graph construction is static, meaning the graph is “compiled” and then run. As a simple example, in PyTorch you can write a for loop construction using standard Python syntax

for _ in range(T):
    h = torch.matmul(W, h) + b

and T can change between executions of this code. In TensorFlow this requires the use of control flow operations in constructing the graph such as the tf.while_loop. TensorFlow does have thedynamic_rnn for the more common constructs but creating custom dynamic computations is more difficult.

The simple graph construction in PyTorch is easier to reason about, but perhaps even more importantly, it’s easier to debug. Debugging PyTorch code is just like debugging Python code. You can use pdb and set a break point anywhere. Debugging TensorFlow code is not so easy. The two options are to request the variables you want to inspect from the session or to learn and use the TensorFlow debugger (tfdbg).

Coverage

Winner: TensorFlow

As PyTorch ages, I expect the gap here will converge to zero. However, there is still some functionality which TensorFlow supports that PyTorch doesn’t. A few features that PyTorch doesn’t have (at the time of writing) are:

  • Flipping a tensor along a dimension (np.flipnp.flipudnp.fliplr)
  • Checking a tensor for NaN and infinity (np.is_nannp.is_inf)
  • Fast Fourier transforms (np.fft)

These are all supported in TensorFlow. Also the TensorFlow contrib package has many more higher level functions and models than PyTorch.

Serialization

Winner: TensorFlow

Saving and loading models is simple in both frameworks. PyTorch has an especially simple API which can either save all the weights of a model or pickle the entire class. The TensorFlow Saver object is also easy to use and exposes a few more options for check-pointing.

The main advantage TensorFlow has in serialization is that the entire graph can be saved as a protocol buffer. This includes parameters as well as operations. The graph can then be loaded in other supported languages (C++, Java). This is critical for deployment stacks where Python is not an option. Also this can, in theory, be useful when you change the model source code but want to be able to run old models.

Deployment

Winner: TensorFlow

For small scale server-side deployments both frameworks are easy to wrap in e.g. a Flask web server.

For mobile and embedded deployments TensorFlow works. This is more than can be said of most other deep learning frame-works including PyTorch. Deploying to Android or iOS does require a non-trivial amount of work in TensorFlow but at least you don’t have to rewrite the entire inference portion of your model in Java or C++.

For high-performance server-side deployments there is TensorFlow Serving. I don’t have experience with TensorFlow Serving, so I can’t write confidently about the pros and cons. For heavily used machine learning services, I suspect TensorFlow Serving could be a sufficient reason to stay with TensorFlow. Other than performance, one of the noticeable features of TensorFlow Serving is that models can be hot-swapped easily without bringing the service down. Checkout this blog post from Zendesk for an example deployment of a QA bot with TensorFlow serving.

Documentation

Winner: Tie

I’ve found everything I need in the docs for both frameworks. The Python APIs are well documented and there are enough examples and tutorials to learn either framework.

One edge case gripe is that the PyTorch C library is mostly undocumented. However, this really only matters when writing a custom C extension and perhaps if contributing to the software.

Data Loading

Winner: PyTorch

The APIs for data loading are well designed in PyTorch. The interfaces are specified in a dataset, a sampler, and a data loader. A data loader takes a dataset and a sampler and produces an iterator over the dataset according to the sampler’s schedule. Parallelizing data loading is as simple as passing a num_workers argument to the data loader.

I haven’t found the tools for data loading in TensorFlow (readers, queues, queue runners, etc.) especially useful. In part this is because adding all the preprocessing code you want to run in parallel into the TensorFlow graph is not always straight-forward (e.g. computing a spectrogram). Also, the API itself is more verbose and harder to learn.

Device Management

Winner: TensorFlow

Device management in TensorFlow is about as seamless as it gets. Usually you don’t need to specify anything since the defaults are set well. For example, TensorFlow assumes you want to run on the GPU if one is available. In PyTorch you have to explicitly move everything onto the device even if CUDA is enabled.

The only downside with TensorFlow device management is that by default it consumes all the memory on all available GPUs even if only one is being used. The simple workaround is to specify CUDA_VISIBLE_DEVICES. Sometimes people forget this, and GPUs can appear to be busy when they are in fact idle.

In PyTorch, I’ve found my code needs more frequent checks for CUDA availability and more explicit device management. This is especially the case when writing code that should be able to run on both the CPU and GPU. Also converting say a PyTorch Variable on the GPU into a NumPy array is somewhat verbose.

numpy_var = variable.cpu().data.numpy()

Custom Extensions

Winner: PyTorch

Building or binding custom extensions written in C, C++ or CUDA is doable with both frameworks. TensorFlow again requires more boiler plate code though is arguably cleaner for supporting multiple types and devices. In PyTorch you simply write an interface and corresponding implementation for each of the CPU and GPU versions. Compiling the extension is also straight-forward with both frameworks and doesn’t require downloading any headers or source code outside of what’s included with the pip installation.

A note on TensorBoard

TensorBoard is a tool for visualizing various aspects of training machine learning models. It’s one of the most useful features to come out of the TensorFlow project. With a few code snippets in a training script you can view training curves and validation results of any model. TensorBoard runs as a web service which is especially convenient for visualizing results stored on headless nodes.

This was one feature that I made sure I could keep (or find an alternative to) before using PyTorch. Thankfully there are, at least, two open-source projects which allow for this. The first istensorboard_logger and the second is crayon. The tensorboard_logger library is even easier to use than TensorBoard “summaries” in TensorFlow, though you need TensorBoard installed to use it. The crayon project is a complete replacement for TensorBoard but requires more setup (docker is a prerequisite).

A note on Keras

Keras is a higher-level API with a configurable back-end. At the moment TensorFlow, Theano and CNTK are supported, though perhaps in the not too distant future PyTorch will be included as well. Keras is also distributed with TensorFlow as a part of tf.contrib.

Though I didn’t discuss Keras above, the API is especially easy to use. It’s one of the fastest ways to get running with many of the more commonly used deep neural network architectures. That said, the API is not as flexible as PyTorch or core TensorFlow.

A note on TensorFlow Fold

Google announced TensorFlow Fold in February of 2017. The library is built on top of TensorFlow and allows for more dynamic graph construction. The main advantage of the library appears to be the dynamic batching. Dynamic batching automatically batches computations on inputs of varying size (think recursive networks on parse trees). In terms of programmability, the syntax is not as straightforward as PyTorch, though in a some cases the performance improvements from batching may be worth the cost.

 

原文地址:https://awni.github.io/pytorch-tensorflow/

<think>好的,我现在需要帮助用户解决在未安装PyTorchTensorFlow 2.0以上版本或Flax的情况下,如何仅使用Hugging Face的tokenizers、配置和文件工具的问题。根据用户提供的引用内容,错误信息显示缺少这些框架导致无法加载模型,但用户可能希望在不安装这些深度学习框架的情况下使用其他功能。 首先,我应该回顾用户提供的引用内容,特别是引用[1]、[3]、[5]中的信息。这些引用指出错误是由于缺少PyTorch等框架,并且需要安装其中一个。但用户的问题是如何在不安装这些框架的情况下继续使用部分功能。这可能涉及到Hugging Face Transformers库的设计,是否允许在没有后端框架的情况下使用部分组件。 根据我的知识,Hugging Face的Transformers库确实允许在安装时选择最小依赖,比如仅安装tokenizers。用户可能在安装时没有正确指定额外依赖,导致自动尝试加载模型时失败。因此,正确的步骤可能包括: 1. 确保安装Transformers库时使用`pip install transformers[tokenizers]`,这样可以仅安装处理tokenizer所需的依赖,而无需PyTorchTensorFlow2. 在代码中避免加载模型部分,仅导入和使用tokenizer、配置类。 3. 验证安装后的环境是否能够正确运行tokenizer相关的代码,比如加载tokenizer并处理文本。 需要检查用户提供的引用中是否有相关信息。例如,引用[4]提到用户可能用不同的安装方法导致问题,比如在Anaconda和cmd中分别安装,这可能引起环境混乱。因此,建议用户使用虚拟环境,并正确指定安装选项。 此外,用户可能需要修改他们的代码,确保不调用任何需要模型的部分。例如,使用`AutoTokenizer.from_pretrained()`而不是`AutoModel.from_pretrained()`。同时,需要处理可能的警告信息,通过设置环境变量`TRANSFORMERS_NO_ADVISORY_WARNINGS`来隐藏警告。 总结解决步骤: - 正确安装Transformers库的tokenizers额外依赖。 - 使用虚拟环境避免冲突。 - 修改代码仅使用tokenizer和配置工具。 - 处理可能出现的警告。 最后,生成相关问题,帮助用户进一步了解如何管理依赖或处理其他相关错误。</think>### 解决方案:仅使用Tokenizer与配置工具的方法 当未安装PyTorchTensorFlow或Flax时,可通过以下步骤**仅使用Hugging Face库的Tokenizer、配置和文件工具**: 1. **安装最小化依赖包** 使用`pip`安装仅包含必要工具的`transformers`模块,并跳过深度学习框架依赖: ```bash pip install transformers[tokenizers] # 安装tokenizers及其依赖 ``` 2. **验证安装环境** 通过以下命令确认是否已正确安装且无深度学习框架: ```bash pip list | grep -E "torch|tensorflow|flax" # 检查是否无输出(表示未安装框架) ``` 3. **代码中限制功能调用** 在Python脚本中**仅导入Tokenizer和配置类**,避免加载模型: ```python from transformers import AutoTokenizer, AutoConfig tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased") # 加载Tokenizer config = AutoConfig.from_pretrained("bert-base-uncased") # 加载配置 text = "Hello, world!" tokens = tokenizer(text) # 成功编码文本 ``` 4. **处理警告信息** 若出现`No framework specified`警告,可通过设置环境变量屏蔽: ```python import os os.environ["TRANSFORMERS_NO_ADVISORY_WARNINGS"] = "1" # 禁用警告提示 ``` ### 关键注意事项 - **模型加载限制**:所有涉及`AutoModel`、`from_pretrained`加载模型权重的操作均会报错[^1][^5] - **虚拟环境推荐**:使用`venv`或`conda`创建独立环境,避免依赖冲突[^4] - **Tokenizer离线缓存**:首次使用时会下载相关文件到`~/.cache/huggingface/`,后续可离线使用
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值