一、apex
是什么:混合精度
什么用:提升GPU上的训练速度
API文档:https://nvidia.github.io/apex
使用要求:
Python 3
CUDA 9 or newer
PyTorch 0.4 or newer. The CUDA and C++ extensions require pytorch 1.0 or newer.
推荐已发布的最新版本,见https://pytorch.org/.
我们也针对最新的主分支进行测试, obtainable from https://github.com/pytorch/pytorch.
在Docker容器中使用Apex通常很方便。兼容的选项包括:
NVIDIA Pytorch containers from NGC, which come with Apex preinstalled. To use the latest Amp API, you may need to pip uninstall apex then reinstall Apex using the Quick Start commands below.
official Pytorch -devel Dockerfiles, e.g. docker pull pytorch/pytorch:nightly-devel-cuda10.0-cudnn7, in which you can install Apex using the Quick Start commands.
如何安装:
Linux:
为了性能和完整的功能,建议通过CUDA和c++扩展来安装Apex
$ git clone https://github.com/NVIDIA/apex
$ cd apex
$ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
Apex 同样支持 Python-only build (required with Pytorch 0.4) via
$ pip install -v --no-cache-dir ./
Windows:
Windows支持是实验性的,建议使用Linux。
如果你能在你的系统上从源代码构建Pytorch,采用pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" .
pip install -v --no-cache-dir .(没有CUDA/ c++扩展)更可能有效。
如果您已经在Conda环境中安装了Pytorch,请确保在相同的环境中安装Apex。
例子:
# Declare model and optimizer as usual, with default (FP32) precision
model = torch.nn.Linear(D_in, D_out).cuda()
optimizer = torch.optim.SGD(model.parameters(), lr=1e-3)
# Allow Amp to perform casts as required by the opt_level
model, optimizer = amp.initialize(model, optimizer, opt_level="O1")
...
# loss.backward() becomes:
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
二、我的安装流程:
1. $ git clone https://github.com/NVIDIA/apex 完成
2. $ cd apex 完成
3. $ pip install -v --no-cache-dir --global-option="--cpp_ext" --global-option="--cuda_ext" ./
3时出现报错,这个问题issue上有很多人在问
Cleaning up...
Removing source in /tmp/pip-req-build-v0deounv
Removed build tracker '/tmp/pip-req-tracker-3n3fyj4o'
ERROR: Command errored out with exit status 1: /users4/zsun/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-v0deounv/setup.py'"'"'; __file__='"'"'/tmp/p
ip-req-build-v0deounv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' --cpp_e
xt --cuda_ext install --record /tmp/pip-record-rce1cb4d/install-record.txt --single-version-externally-managed --compile Check the logs for full command output.
Exception information:
Traceback (most recent call last):
File "/users4/zsun/anaconda3/lib/python3.6/site-packages/pip/_internal/cli/base_command.py", line 153, in _main
status = self.run(options, args)
File "/users4/zsun/anaconda3/lib/python3.6/site-packages/pip/_internal/commands/install.py", line 455, in run
use_user_site=options.use_user_site,
File "/users4/zsun/anaconda3/lib/python3.6/site-packages/pip/_internal/req/__init__.py", line 62, in install_given_reqs
**kwargs
File "/users4/zsun/anaconda3/lib/python3.6/site-packages/pip/_internal/req/req_install.py", line 888, in install
cwd=self.unpacked_source_directory,
File "/users4/zsun/anaconda3/lib/python3.6/site-packages/pip/_internal/utils/subprocess.py", line 275, in runner
spinner=spinner,
File "/users4/zsun/anaconda3/lib/python3.6/site-packages/pip/_internal/utils/subprocess.py", line 242, in call_subprocess
raise InstallationError(exc_msg)
pip._internal.exceptions.InstallationError: Command errored out with exit status 1: /users4/zsun/anaconda3/bin/python -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-req-build-v0deoun
v/setup.py'"'"'; __file__='"'"'/tmp/pip-req-build-v0deounv/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code