[torch]安装遇到的问题

本文记录了在Tsubame上安装Torch过程中遇到的OpenBLAS无法安装、CMake复制文件失败、quota超过限制等问题,以及如何通过建立软链接、修改文件权限、使用本地安装cmake等方式逐一解决的过程。最后成功安装了hdf5和rnn库,但遇到了cuda runtime error (30),通过更新驱动和库解决了问题。
部署运行你感兴趣的模型镜像

最近提特征提的整个tsubame的空间都要满了..

1

今天安torch,http://torch.ch/docs/getting-started.html

git clone https://github.com/torch/distro.git ~/torch --recursive
cd ~/torch; bash install-deps;
./install.sh

tsubame没法和外部网络联系,所以我是先git clone到本机上后,scp到tsubame的.
第二行安装的是一些依赖库啥的, 我就没运行第二行代码.
第三行运行后,

Linking C shared module libtorch.so
[100%] Built target torch
cd build && make install
[  3%] Built target luaT
[ 50%] Built target TH
[100%] Built target torch
Install the project...
-- Install configuration: "Release"
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/share/cmake/torch/TorchExports.cmake
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/share/cmake/torch/TorchExports-release.cmake
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/share/cmake/torch/TorchConfig.cmake
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/share/cmake/torch/TorchWrap.cmake
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/share/cmake/torch/TorchPathsInit.cmake
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/share/cmake/torch/TorchPackage.cmake
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lib/libtorch.so
-- Set runtime path of "/work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lib/libtorch.so" to "$ORIGIN/../lib:/work1/t2g-shinoda2011/15M54105/software/torch/install/lib:/usr/apps.sp3/isv/ansys_inc/16.2/v162/Framework/bin/Linux64"
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/init.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/File.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/Tensor.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/CmdLine.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/FFI.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/Tester.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/TestSuite.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/test.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/README.md
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc/random.md
CMake Error at cmake_install.cmake:101 (FILE):
  file INSTALL cannot copy file
  "/work1/t2g-shinoda2011/15M54105/software/torch/pkg/torch/doc/random.md" to
  "/work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc/random.md".

我把整个torch文件夹移除后,又来了一遍.

-- Set runtime path of "/work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lib/libtorch.so" to "$ORIGIN/../lib:/work1/t2g-shinoda2011/15M54105/software/torch/install/lib:/usr/apps.sp3/isv/ansys_inc/16.2/v162/Framework/bin/Linux64"
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/init.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/File.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/Tensor.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/CmdLine.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/FFI.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/Tester.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/TestSuite.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/test.lua
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/README.md
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc/random.md
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc/storage.md
-- Installing: /work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc/cmdline.md
CMake Error at cmake_install.cmake:101 (FILE):
  file INSTALL cannot copy file
  "/work1/t2g-shinoda2011/15M54105/software/torch/pkg/torch/doc/cmdline.md"
  to
  "/work1/t2g-shinoda2011/15M54105/software/torch/install/lib/luarocks/rocks/torch/scm-1/lua/torch/doc/cmdline.md".

同样的问题.
可是刚才还说没法copy的文件random.md, 第二次怎么就能copy过去了呢?
我又删除,再install了几遍,一样的错,同一个文件夹,只是每次说copy不了的都是不同的文件.

最后没办法了,就把doc底下的文件夹都清空了, 在运行这里就不报错了.
另一个文件夹又是copy不了..
于是我连着清空了三个文件夹(都是图片和文档所以没有关系.)

rm ./extra/nn/doc/*
rm ./extra/nn/doc/image/*
rm ./pkg/torch/doc/*
rm ./pkg/image/assets/*

安装成功了..
最后也不知道发生了什么..
反正不是权限问题..

2.Openblas

发现share/下竟然没有nn.删掉重来.

查看了“install-deps”文件,可见安装了OpenBLAS、build-essential、gcc、g++、curl、cmake、libreadline-dev、Git-core、libqt4-core、libqt4-gui、libqt4-dev、libjpeg-dev、libpng-dev、ncurses-dev、imagemagick、libzmq3-dev、gfortran、unzip、gnuplot、gnuplot-x11、ipython依赖包

1)install.sh里有Openblas,
先install openblas
make FC=gfortran
tsubame报错
/usr/bin/ld: cannot find -lgfortran
可是根据http://tsubame.gsic.titech.ac.jp/docs/guides/tsubame2/html/programming.html#gpu, 它是安装了的.(而且我自己安装过gcc,师兄说应该gfortran是默认安装的可是我在gcc文件夹下并没有找到它..)

15M54105@t2a006163:~> gfortran --v
Using built-in specs.
Target: x86_64-suse-linux
Configured with: ../configure --prefix=/usr --infodir=/usr/share/info --mandir=/usr/share/man --libdir=/usr/lib64 --libexecdir=/usr/lib64 --enable-languages=c,c++,objc,fortran,obj-c++,java,ada --enable-checking=release --with-gxx-include-dir=/usr/include/c++/4.3 --enable-ssp --disable-libssp --with-bugurl=http://bugs.opensuse.org/ --with-pkgversion='SUSE Linux' --disable-libgcj --disable-libmudflap --with-slibdir=/lib64 --with-system-zlib --enable-__cxa_atexit --enable-libstdcxx-allocator=new --disable-libstdcxx-pch --enable-version-specific-runtime-libs --program-suffix=-4.3 --enable-linux-futex --without-system-libunwind --with-cpu=generic --build=x86_64-suse-linux
Thread model: posix
gcc version 4.3.4 [gcc-4_3-branch revision 152973] (SUSE Linux) 

所以我export LD_LIBRARY_PATH=/usr/lib64/:$LD_LIBRARY_PATH
一样的错…
于是我又export LIBRARY_PATH=/usr/lib64/:$LIBRARY_PATH
还是错.
查看了以下tsubame

15M54105@t2a006180:~> ls /usr/lib64/libgfortran.* -l
lrwxrwxrwx 1 root root      20 Aug 11  2014 /usr/lib64/libgfortran.so.3 -> libgfortran.so.3.0.0
-rwxr-xr-x 1 root root 1137136 Apr 10  2014 /usr/lib64/libgfortran.so.3.0.0

师兄说程序找的是libgfortran.so,可是这里并没有.建议我建一个软连接libgfortran.so, 链接到libgfortran.so.3.0.0

/work1/t2g-shinoda2011/15M54105/software/gcc/lib> ln -s /usr/lib64/libgfortran.so.3.0.0 libgfortran.so

安装成功openblas了,感动.

3.cmake cannot copy file

CMake Error at cmake_install.cmake:36 (FILE):
  file INSTALL cannot make directory
  "/work1/t2g-shinoda2011/15M54105/torch/distro/install/share/cmake/torch/FindCUDA":
  Disk quota exceeded

之前也遇到过,就是因为文件的组是”user”,而我要拷贝的地方的文件的组是”t2g-shinoda2011”, 在当时安装caffe的时候,每次往tsubame上传文件之后,都用find . -type d -print0 | xargs -0 chmod g+s,可是这里install自己生成的新文件的组又成了”user”,于是没法操作..
所以结论就是,不要在tsubame上直接git clone, 还是在本机上下载下来然后以压缩文件上传到tsubame,再解压就好了.

不过它还是出错了

 file INSTALL cannot copy file
  "/work1/t2g-shinoda2011/15M54105/torch/distro/cmake/3.6/Modules/FindCUDA.cmake"
  to
  "/work1/t2g-shinoda2011/15M54105/torch/distro/install/share/cmake/torch/FindCUDA.cmake".
15M54105@t2a006163:/work1/t2g-shinoda2011/15M54105/torch/distro> ls -l /work1/t2g-shinoda2011/15M54105/torch/distro/cmake/3.6/Modules/FindCUDA.cmake
-rw-r--r-- 1 15M54105 t2g-shinoda2011 80618 Jul 11 17:22 /work1/t2g-shinoda2011/15M54105/torch/distro/cmake/3.6/Modules/FindCUDA.cmake
//文件的组是t2g-shinoda2011

15M54105@t2a006163:/work1/t2g-shinoda2011/15M54105/torch/distro> ls -l /work1/t2g-shinoda2011/15M54105/torch/distro/install/share/cmake/torch/
total 0
-rw-r--r-- 1 15M54105 users 0 Jul 11 17:43 FindCUDA.cmake
//FindCUDA.cmake文件的组是users

15M54105@t2a006163:/work1/t2g-shinoda2011/15M54105/torch/distro> ls -l /work1/t2g-shinoda2011/15M54105/torch/distro/install/share/cmake/
total 4
drwxr-xr-x 2 15M54105 t2g-shinoda2011 4096 Jul 11 17:43 torch
//torch文件夹的组是t2g-shinoda2011,但是是"drwxr-xr-x", 需要把它改成"drwxr-sr-x".
//我试着在这个文件夹下"touch test.txt", 这个文件的组是user.
//我"chmod g+s torch/"之后在"touch a.txt",这个文件的组是t2g-shinoda2011.

我想起来了我cmake2.8.12是装在home下的(user组).
所以重新在$WORK下安装了cmake3.6.0

export CC=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/gcc
export CXX=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/g++
cmake -DCMAKE_INSTALL_PREFIX:PATH=/work1/t2g-shinoda2011/15M54105/software/cmake-3.6.0 .
make
make install

在make install时又出现了”quota exceed”问题,于是我newgrp t2g-shinoda2011, 然后再运行make install就安装成功了.
修改.bashrc

export PATH=$SOFT/cmake-3.6.0/bin:$SOFT/gcc/bin:$SOFT/hdf5/bin:~/local/bin:~/.gem/ruby/2.0.0/bin:$LOCAL/bin:$INTALL_RN/yasm/bin:$WORK/lisa-caffe-public/examples/LRCN_activity_recognition:$TREC/package/improved_trajectory_release/:$TREC/scripts:$TREC/package/GS_SVM/bin:$TREC/package/feat2sv-0.59:$TREC/package/colordescriptors40/x86_64-linux-gcc:$PATH

修改install.sh

if [[ `uname` == 'Linux' ]]; then
    export CMAKE_LIBRARY_PATH=/work1/t2g-shinoda2011/15M54105/OpenBLAS-0.2.18:$CMAKE_LIBRARY_PATH
fi
Warning: unmatched variable LUALIB
CMake Error at /work1/t2g-shinoda2011/15M54105/torch/distro/install/share/cmake/torch/FindCUDA.cmake:643 (message):
  Specify CUDA_TOOLKIT_ROOT_DIR
Call Stack (most recent call first):
  CMakeLists.txt:7 (FIND_PACKAGE)

修改.bashrc(http://stackoverflow.com/questions/19980412/how-to-let-cmake-find-cuda)

export CUDA_BIN_PATH=/usr/apps.sp3/cuda/7.5/bin:$CUDA_BIN_PATH
15M54105@t2a006176:/work1/t2g-shinoda2011/15M54105/torch/distro> ./install.sh > loginstall
No existing manifest. Attempting to rebuild...
Warning: unmatched variable LUALIB
CMakeFiles/paths.dir/paths.c.o: In function `lua_tmpname':
paths.c:(.text+0x5e7): warning: the use of `tempnam' is dangerous, better use `mkstemp'
Warning: unmatched variable LUALIB
Warning: unmatched variable LUALIB
CMake Warning:
  Manually-specified variables were not used by the project:

    LUA_INCDIR
    LUA_LIBDIR


Warning: unmatched variable LUALIB
CMake Warning:
  Manually-specified variables were not used by the project:

    LUA_INCDIR
    LUA_LIBDIR


Warning: unmatched variable LUALIB
Warning: unmatched variable LUALIB
CMake Warning:
  Manually-specified variables were not used by the project:

    LUALIB


Warning: unmatched variable LUALIB
CMake Warning:
  Manually-specified variables were not used by the project:

    LUA_INCDIR
    LUA_LIBDIR


Warning: unmatched variable LUALIB
Warning: unmatched variable LUALIB
CMake Warning:
  Manually-specified variables were not used by the project:

    LUADIR


CMake Warning:
  Manually-specified variables were not used by the project:

    CMAKE_LIBRARY_PATH


SOX_INCLUDE_DIR: SOX_INCLUDE_DIR-NOTFOUND
SOX_LIBRARIES: SOX_LIBRARIES-NOTFOUND
FFTW_INCLUDE_DIR: /usr/include
FFTW_LIBRARIES: /usr/lib64/libfftw3.so
CMake Error: The following variables are used in this project, but they are set to NOTFOUND.
Please set them or make sure they are set and tested correctly in the CMake files:
SOX_INCLUDE_DIR
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
   used as include directory in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
SOX_LIBRARIES
    linked by target "audio" in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio
    linked by target "sox" in directory /work1/t2g-shinoda2011/15M54105/torch/distro/extra/audio


Error: Build error: Failed building.
Warning: unmatched variable LUALIB

但是这里又说,https://github.com/torch/distro/issues/93
“that’s okay, this is an optional repo. you can still use torch as-is.”

总之我最后还是到了

Do you want to automatically prepend the Torch install location
to PATH and LD_LIBRARY_PATH in your /home/usr9/15M54105/.bashrc? (yes/no)
[yes] >>> 

所以成功安装了吗?

总结

//修改install.sh的openblas的路径
export CMAKE_LIBRARY_PATH=/work1/t2g-shinoda2011/15M54105/OpenBLAS-0.2.18:$CMAKE_LIBRARY_PATH
#!/bin/bash
rm -rf distro/
tar zxvf distro.tar.gz
root=/work1/t2g-shinoda2011/15M54105/torch/distro/
#rm $root/extra/nn/doc/*
#rm $root/extra/nn/doc/image/*
#rm $root/pkg/torch/doc/*
#rm $root/pkg/image/assets/*
rm distro/install.sh
cp install.sh distro/
export CC=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/gcc
export CXX=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/g++
newgrp t2g-shinoda2011
cd distro/
./install.sh

安装hdf5

需要现状totem.可是tsubame又崩了,于是我只得git clone了totem再上传到tsubame,然后luarack make totem-0-0.rockspec(https://raw.githubusercontent.com/torch/rocks/master/totem-0-0.rockspec)

-- HDF5: Using hdf5 compiler wrapper to determine C configuration
CMake Error at /work1/t2g-shinoda2011/15M54105/software/cmake-3.6.0/share/cmake-3.6/Modules/FindPackageHandleStandardArgs.cmake:148 (message):
  Could NOT find HDF5: Found unsuitable version "", but required is at least
  "1.8" (found
  /usr/lib64/librt.so;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so;HDF5_hdf5_LIBRARY-NOTFOUND;HDF5_hdf5_cpp_LIBRARY-NOTFOUND;HDF5_hdf5_LIBRARY-NOTFOUND)
Call Stack (most recent call first):
  /work1/t2g-shinoda2011/15M54105/software/cmake-3.6.0/share/cmake-3.6/Modules/FindPackageHandleStandardArgs.cmake:386 (_FPHSA_FAILURE_MESSAGE)
  /work1/t2g-shinoda2011/15M54105/software/cmake-3.6.0/share/cmake-3.6/Modules/FindHDF5.cmake:707 (find_package_handle_standard_args)
  CMakeLists.txt:4 (FIND_PACKAGE)

-- Configuring incomplete, errors occurred!
See also "/work1/t2g-shinoda2011/15M54105/torch/distro/torch-hdf5/build/CMakeFiles/CMakeOutput.log".
make: *** No targets specified and no makefile found.  Stop.

https://github.com/pachterlab/kallisto/issues/65
I tried to modity “hdf5-0-0.rockspec” by adding -DCMAKE_LIBRARY_PATH=/work1/t2g-shinoda2011/15M54105/software/hdf5/lib to “cmake”

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="$(LUA_BINDIR)/.." -DCMAKE_INSTALL_PREFIX="$(PREFIX)" -DCMAKE_LIBRARY_PATH=/work1/t2g-shinoda2011/15M54105/software/hdf5/lib; 

doesn’t work.

我这个蠢货,不会cmake真是害死人..
CMAKE_PREFIX_PATHSpecifies a path which will be used by the FIND_XXX() commands…
而现在出错的就是CMakeList.txt里的FIND_PACKAGE(HDF5 1.8 REQUIRED)

cmake .. -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH="/work1/t2g-shinoda2011/15M54105/software/hdf5;$(LUA_BINDIR)/.." -DCMAKE_INSTALL_PREFIX="$(PREFIX)" -DCMAKE_LIBRARY_PATH="/work1/t2g-shinoda2011/15M54105/software/hdf5/lib"; 

so->

export CC=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/gcc
export CXX=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/g++
//修改hdf5-0-0.rockspec
luarocks make hdf5-0-0.rockspec

just use

cd hdf5-torch
luarocks make

is fine…

quota exceed

虽然上面的解决方法也可以,我刚刚知道了一个新命令. sg

sg --help
Usage: sg [-l|-c command] [group]
sg - change the effective group id

  -l, --login    reinitialize environment as if logged in
  -c  command    Execute `command' with new group
      --help     Give this help list
  -u, --usage    Give a short usage message
  -v, --version  Print program version

new version

rnn

I need to re-install rnn(https://github.com/Element-Research/rnn) because I did something stupid…

Most issues can be resolved by updating the various dependencies:

luarocks install torch
luarocks install nn
luarocks install dpnn
luarocks install torchx

If you are using CUDA :

luarocks install cutorch
luarocks install cunn
luarocks install cunnx

Finally,

luarocks install rnn

However, after I installed rnn sucessfully, the require rnn failed..
I thought it might be the old version of torch(I installed torch last June),
so I decided to re-install torch.

git clone https://github.com/torch/distro.git --recursive
#cd ~/torch; bash install-deps;
#修改install.sh的openblas的路径
#export CMAKE_LIBRARY_PATH=/work1/t2g-shinoda2011/15M54105/OpenBLAS-0.2.18:$CMAKE_LIBRARY_PATH
export CC=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/gcc
export CXX=/work1/t2g-shinoda2011/15M54105/software/gcc/bin/g++
newgrp t2g-shinoda2011
./install.sh

./install.sh里这回报错的是要求安装moses.
http://www.achchuthan.org/2014/06/install-moses-on-ubuntu-14.04.html

git clone https://github.com/moses-smt/mosesdecoder.git
cd mosesdecoder/
./bjam --with-boost=/work1/t2g-shinoda2011/15M54105/local/ -j5

cuda runtime error (30)

I re-installed torch on tsubame3.0, after that,

th 
require 'cutorch'

then I got:

th> require 'cutorch'
THCudaCheck FAIL file=/gs/hs0/tga-shinoda/15M54105/distro/extra/cutorch/lib/THC/THCGeneral.c line=70 error=30 : unknown error
...oda/15M54105/distro/install/share/lua/5.1/trepl/init.lua:389: cuda runtime error (30) : unknown error at /gs/hs0/tga-shinoda/15M54105/distro/extra/cutorch/lib/THC/THCGeneral.c:70
stack traceback:
    [C]: in function 'error'
    ...oda/15M54105/distro/install/share/lua/5.1/trepl/init.lua:389: in function 'require'
    [string "_RESULT={require 'cutorch'}"]:1: in main chunk
    [C]: in function 'xpcall'
    ...oda/15M54105/distro/install/share/lua/5.1/trepl/init.lua:661: in function 'repl'
    ...105/distro/install/lib/luarocks/rocks/trepl/scm-1/bin/th:204: in main chunk
    [C]: at 0x00406d20

Someone said reboot or reinstall driver may help, however I can’t do so.
First, I tried to update torch and cutorch

luarocks install torch
luarocks install cutorch

Well, it doesn’t work.

Then I followed my tutor’s advice.

The environment of login nodes and computation nodes may differ. 
You'd better to build it on a computation node by 
qrsh -g tga-shinoda -l q_node=1 -l h_rt=1:00:00.

So I did:

git clone https://github.com/torch/distro.git --recursive
cd distro/
#change install.sh
qrsh -g tga-shinoda -l q_node=1 -l h_rt=1:00:00
cd $WORK/distro/
module load cuda
./install.sh

then I tested:

source ~/.bashrc
th
require 'cutorch'

it works !
Then I install rnn :

#install hdf5: done
git clone https://github.com/deepmind/torch-hdf5
cd torch-hdf5
luarocks make hdf5-0-0.rockspec LIBHDF5_LIBDIR="??/"

luarocks install torch #done
luarocks install nn #done
luarocks install dpnn #done
luarocks install torchx #done

luarocks install cutorch #done
luarocks install cunn #done
luarocks install cunnx #done

luarocks install rnn #done

您可能感兴趣的与本文相关的镜像

PyTorch 2.6

PyTorch 2.6

PyTorch
Cuda

PyTorch 是一个开源的 Python 机器学习库,基于 Torch 库,底层由 C++ 实现,应用于人工智能领域,如计算机视觉和自然语言处理

### torch 安装错误解决方案 在 Windows 10 下安装 PyTorch 及其扩展包(如 `torch_scatter` 和 `torch_sparse`)时可能会遇到各种问题,以下是针对常见错误的综合解决方案。 #### 1. 环境配置 确保 Python 版本与 PyTorch 的兼容性。PyTorch 支持的最低版本通常为 Python 3.7 或更高版本[^1]。建议使用 Anaconda 创建虚拟环境来管理依赖项: ```bash conda create -n pytorch_env python=3.9 conda activate pytorch_env ``` 如果未安装 CUDA,则需选择 CPU-only 版本;否则应根据显卡驱动程序支持的 CUDA 版本来匹配合适的 PyTorch 轮子文件。 #### 2. 使用官方推荐的方式安装 PyTorch 访问 [PyTorch官网](https://pytorch.org/get-started/locally/) 并根据操作系统、CUDA 配置自动生成适合的安装命令。例如,在不启用 GPU 加速的情况下可以运行如下指令完成基础框架部署: ```bash pip install torch torchvision torchaudio cpuonly ``` 对于需要 GPU 支持的情况,请替换相应参数以加载对应于本地硬件条件下的预编译二进制文件。 #### 3. 处理特定组件安装失败情况 当尝试通过 pip 单独获取某些附加模块像 `torch_scatter`, 如果直接调用标准方法报错提示无法构建 whl 文件时, 推荐采用 conda 渠道或者手动指定源地址来进行操作: - **Conda 方式**: ```bash conda install -c pyg torch-scatter ``` - **Pre-built Wheels 方法**: 访问 https://pytorch-geometric.com/whl/, 找到适合自己计算平台(CPU/CUDA X.X)以及当前已安裝之 PyTorch 版本相吻合的链接後执行下载并安装动作. #### 4. 自定义目标目录安装(备用选项) 若常规途径依旧碰壁可考虑更改默认存储位置规避权限冲突等问题发生 : ```bash pip install --upgrade pip setuptools wheel pip install --target=D:\custom_path\lib\site-packages torch-scatter ``` 注意调整实际路径至个人需求处[^3]. --- ### 示例代码片段展示如何验证安装成功与否 下面提供了一段简单的测试脚本用于确认所装设库能否正常运作无误. ```python import torch from torch_geometric.data import Data edge_index = torch.tensor([[0, 1], [1, 0]], dtype=torch.long) data = Data(edge_index=edge_index, num_nodes=2) print(data) ``` 如果一切顺利的话应该能够看到类似于这样的输出结果而不抛异常 : ``` Data(edge_index=[2, 2], num_nodes=2) ``` ---
评论 3
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值