背景:
有多台GPU服务器,我的一些kernel需要在上面编译以及跑,但是每次ssh去登录,启动docker,都非常麻烦,有没有办法可以一键启动,在本地远程操作呢?
下面就是这次的实现:
远程执行
在src文件夹下提供了remote_code_exe.sh脚本,可以在本地机器上远程操控其它服务器,可以实现单个case编译,执行,所有case编译执行
脚本中提供了DEBUG环境变量来选择是否输出脚本log, 可以用来调试脚本
root# ./remote_code_exe.sh -h
Usage: <DEBUG=1/0> ./remote_code_exe.sh gpu/gcu [options]
Options:
-b test case build test case(all/case_name, case_name is not contain the prefix 'test_' and suffix '_cuda' or '_tops')
-c clean build clean build(all/case_name, case_name is not contain the prefix 'test_' and suffix '_cuda' or '_tops')
-d device id device id(default: 0)
-r test case run case(all/case_name, case_name is not contain the prefix 'test_' and suffix '_cuda' or '_tops')
-m run mode run mode(default: 0, when run mode is 1, CHIP_BENCH_MORE_DEBUG_PARAMS=1)
-h help message
exp:
1. DEBUG=1 ./remote_code_exe.sh [gpu/gcu default gpu] -b record_env
2. DEBUG=1 ./remote_code_exe.sh [gpu/gcu default gpu] -d 0 -r test_xxx_case_cuda/all
3. DEBUG=1 ./remote_code_exe.sh [gpu/gcu default gpu] -c 1/0
编译一个文件
./remote_code_exe.sh gpu -b [case_name]
执行一个文件,结果打印到标准输出
DEBUG=0 ./remote_code_exe.sh gpu -m 1 -d 0 -r [case_name]
编译所有文件
./remote_code_exe.sh gpu -b all
执行report.sh, 结果保存到用该机器name_report.log的文件中
DEBUG=0 ./remote_code_exe.sh gpu -m 1 -d 0 -r all
代码如下:
#!/bin/bash
histchars=
# Set DEBUG to 1 to enable debug mode, 0 to disable
# Get DEBUG from environment, default to 0 if not set
DEBUG=${DEBUG:-0}
# log for debug
DEBUG() {
if [ "$DEBUG" -eq 1 ]; then
# echo "DEBUG: $@"
echo "$@" | while IFS= read -r line; do
echo "DEBUG: $line"
done
fi
}
DEVICE_ID="0"
TYPE=""
BUILD_CASE=""
RUN_CASE=""
RUN_MODE="0"
LIST_INFO="0"
EXE_CMD=""
CFG_FILE="remote_cfg_template.txt"
usage="Usage: <DEBUG=1/0> $0 gpu/gcu [options]
Options:
-b compile case build test case(all/case_name, case_name is not contain the prefix 'test_' and suffix '_cuda' or '_tops')
-c clean build clean build dir
-d device id device id(default: 0)
-e execute rc execute remote command
-f config file configuration file(default: remote_cfg_template.txt)
-r test case run case(all/case_name, case_name is not contain the prefix 'test_' and suffix '_cuda' or '_tops')
-m run mode run mode(default: 0, when run mode is 1, CHIP_BENCH_MORE_DEBUG_PARAMS=1)
-l list GPU list GPU information
-h help message
exp:
1. <DEBUG=1/0> $0 [gpu/gcu default gpu] -b record_env
2. <DEBUG=1/0> $0 [gpu/gcu default gpu] -d 0 -r test_xxx_case_cuda/all
3. <DEBUG=1/0> $0 [gpu/gcu default gpu] -c 1/0
4. <DEBUG=1/0> $0 [gpu/gcu default gpu] -l
5. <DEBUG=1/0> $0 [gpu/gcu default gpu] -e \"nvidia-smi -h\"
"
if [ $# -lt 1 ]; then
echo "$usage"
exit 1
fi
if [ $1 ]; then
if [ $1 == "gpu" ]; then
TYPE="gpu"
elif [ $1 == "gcu" ]; then
TYPE="gcu"
else
TYPE="gpu"
fi
fi
# Use getopt to parse options
OPTIONS=$(getopt -o b:c:d:e:f:r:m:lh --long build:,clean:,device:,exe:,file:,run:,mode:,list,help -- "$@")
if [ $? -ne 0 ]; then
echo "$usage"
exit 1
fi
eval set -- "$OPTIONS"
while true; do
case "$1" in
-b | --build)
BUILD_CASE="$2"
shift 2
;;
-c | --clean)
CLEAN_BUILD="$2"
shift 2
;;
-d | --device)
DEVICE_ID="$2"
shift 2
;;
-e | --exe)
EXE_CMD="$2"
break
;;
-f | --file)
CFG_FILE="$2"
shift 2
;;
-r | --run)
RUN_CASE="$2"
shift 2
;;
-m | --mode)
RUN_MODE="$2"
shift 2
;;
-l | --list)
LIST_INFO="1"
break
;;
-h | --help)
echo "$usage"
exit 0
;;
--)
shift
break
;;
*)
echo

最低0.47元/天 解锁文章
382

被折叠的 条评论
为什么被折叠?



