申请CPU
集群的计算节点默认不允许用户直接登录,对需要交互式处理的程序,在登录到集群后,使用salloc命令分配节点,然后再ssh到分配的节点上进行处理:
[jessy@workstation ~]$ salloc
salloc: Granted job allocation 684
salloc: Waiting for resource configuration
salloc: Nodes cpu1 are ready for job
[jessy@workstation ~]$ ssh cpu1
Warning: Permanently added 'cpu1,192.168.0.3' (ECDSA) to the list of known hosts.
Last login: Tue Dec 6 20:02:41 2022 from 192.168.0.1
[jessy@cpu1 ~]$ exit
logout
Connection to cpu1 closed.
[jessy@workstation ~]$ exit
exit
salloc: Relinquishing job allocation 684
[jessy@workstation ~]$
计算完成后,使用exit命令退出节点,注意需要exit两次,第一次exit是从计算节点退出到登录节点,第二次exit是释放所申请的资源。
申请GPU
GPU申请方式同样有两种
- 通过sbatch提交作业的方式
这种方式在我的另一篇文章里已经写过:https://blog.youkuaiyun.com/qq_43718758/article/details/128129733 - 使用salloc交互运行作业
这里主要介绍如何salloc申请GPU
[jessy@workstation ~]$ salloc -N 1 -n 1 --gres=gpu:1 -t 1:00:00 -p gpu
salloc: Granted job allocation 689
salloc: Waiting for resource configuration
salloc: Nodes gpu1 are ready for job
[jessy@workstation ~]$ ssh gpu1
Warning: Permanently added 'gpu1,192.168.0.4' (ECDSA) to the list of known hosts.
[jessy@gpu1 ~]$ nvidia-smi
Wed Dec 7 17:03:44 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.91.03 Driver Version: 460.91.03 CUDA Version: 11.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 GeForce RTX 308... Off | 00000000:3E:00.0 Off | N/A |
| 30% 30C P8 10W / 350W | 0MiB / 12053MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
[jessy@gpu1 ~]$
输入:salloc -N 1 -n 1 --gres=gpu:1 -t 1:00:00 -p gpu
输入:ssh gpu1
进入gpu,
此时,输入nvidia-smi
就可以查看显卡了
同样的,退出时需要输入两次exit
[jessy@gpu1 ~]$ exit
logout
Connection to gpu1 closed.
[jessy@workstation ~]$ exit
exit
salloc: Relinquishing job allocation 691
[jessy@workstation ~]$
以下是申请gpu命令的几个变形
salloc -N 1 -n 2 --gres=gpu:1 -p gpu