Recently when training a resnet model I found that using Pytorch 1.0.1 can cause memory leak, that is memory use keeps increasing while the training is running. Not sure which part of the code is the reason behind it, but downgrade pytorch from 1.0.1 to 1.0.0 solved this issue.
Problems encountered when trying to change the version of Pytorch
First I was using pytorch 0.4.1, which causes another problem which is somehow similar to this one. Basically it is about the arguments put into a tensor (like) cannot be a tuple for 0.4.1 but can for 0.5.
So in order to upgrade from 0.4.1 to 1.0., I tried to follow the official installation guide, according to that just select corresponding options and run the command given, I should be able to install a 1.0. successfully. But I didn’t. The reason, I guess, is that I am in China. So at this stage, I used Pycharm to upgrade it instead. Just enter the project interpreter in Settings, find pytorch in packages and upgrade with ticking the “Specify version” option. By doing this, my pytorch version became 1.0.1.
Then the memory leak issue appears. To downgrade from 1.0.1 to 1.0.0, I first tried to install from the official previous versions website. But again, because I am in China, the connection is extremely unstable and ReadTimeout error occur frequently.
The solution is instead of pip install, I use conda install and tsinghua source. After adding channel just run conda install pytorch=1.0.0 cudaxxx -c pytorch
, replace cudaxxx with your cuda version, and pytorch 1.0.0 can then be installed successfully.