问题杂录-NVIDIA Bluefield DPU bfb-build编译报错记录与处理办法?(无数坑)

背景

bfb-build是用来根据NVIDIA提供的repo仓库,生成DPU使用的OS镜像库。具体可以参考github:https://github.com/Mellanox/bfb-build/
尝试需谨慎,坑无数。不过填萝卜多了,对bfb-build以及DPU OS构建会有不一样的理解。
本文仅记录编译过程中遇到的异常与处理策略。比较零散,仅供未来查阅,以及相关同行遇到问题后做一个辅助。核心点有几个:1. 建议使用lts版本的分支 2. 老卡建议用老的版本 3.尽量先编译linux
本文使用Bluefield 2进行实际测试,编译环境使用的是anolis的Linux,编译目标是BFB的ubuntu 20.2版本。本文先尝试直接build主干分支的anolis 8.6到DPU上,结果遇到各种问题。最后放弃,切换到了doca1.5的lts分支build。

相关信息:

# OS:
Linux one.one.one.one 4.18.0-553.22.1.0.1.an8.x86_64 #1 SMP Wed Sep 25 18:10:22 CST 2024 x86_64 x86_64 x86_64 GNU/Linux
# Bluefiled
[PN] Part number: MBF2M355A-VESOT_SS    
# BFB branch
remotes/origin/lts-1.5.3
# BFB build最终命令
IMAGE_TYPE=dev ./bfb-build ubuntu 20.04

bfb-build之后直接退出

现象:
在这里插入图片描述
进一步添加-x查看:IMAGE_TYPE=dev INCLUDE_BF3BMC=no ./bfb-build ubuntu 20.04

ERROR: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?

原因:没启动docer服务器
解决办法:

sudo systemctl restart docker

docker.io无法访问

解决办法:
添加镜像加速服务
添加阿里云为例:
https://cr.console.aliyun.com/cn-hangzhou/instances/mirrors
点击镜像加速服务,然后按照操作处理:centos为例

1. 安装/升级Docker客户端
推荐安装1.10.0以上版本的Docker客户端,参考文档docker-ce
2. 配置镜像加速器
针对Docker客户端版本大于 1.10.0 的用户
您可以通过修改daemon配置文件/etc/docker/daemon.json来使用加速器
sudo mkdir -p /etc/docker
sudo tee /etc/docker/daemon.json <<-'EOF'
{
  "registry-mirrors": ["https://69mp4xz9.mirror.aliyuncs.com"]
}
EOF
sudo systemctl daemon-reload
sudo systemctl restart docker

修改后:可以正常下载dockerbuild中的下载项
在这里插入图片描述

报错 ERROR: failed to solve: process

ERROR: failed to solve: process "/bin/sh -c echo 'excludepkgs=OpenIPMI' >> /etc/dnf/dnf.conf &&     dnf -y clean all &&     dnf install -y epel-release &&     dnf --exclude='mlxbf-bootimages*' -y update &&     dnf --exclude='mlxbf-bootimages*' -y install coreutils-single && 
   dnf --exclude='mlxbf-bootimages*' -y install         acpid         audit         chkconfig         chrony         containerd.io       
 cri-tools         cryptsetup         curl         dhcp-client         dosfstools         dracut         dracut-network         dracut-tools         e2fsprogs         edac-utils         efibootmgr         findutils         gawk         glibc-langpack-en         grub2        
grubby         i2c-tools         iperf3         ipmitool         iproute-tc         iputils         jq         kexec-tools         kmod  
      kubeadm         kubelet         kubernetes-cni         libguestfs-tools         libhugetlbfs-utils         libvirt         lm_sensors         lm_sensors-sensord         lsof         ltrace         lvm2         mmc-utils         mokutil         mstflint         net-tools         NetworkManager         NetworkManager-config-server         NetworkManager-ovs         network-scripts         nfs-utils        
nvme-cli         openssh-clients         openssh-server         openssl         parted         passwd         pciutils         perf      
  python3-pip         qemu-kvm         rasdaemon         rsyslog         sg3_utils         shim         sshpass         sudo         sysstat         systemd-timesyncd         system-lsb         tar         tcpdump         unzip         usbutils         util-linux         vim
        virt-install         watchdog         wget         which         xfsprogs         doca-runtime         doca-devel         strongswan-bf         ${MLNX_FW_UPDATER}     &&     dnf --exclude='mlxbf-bootimages*' -y reinstall bf-release &&     dnf -y clean all &&     rm -rf /var/cache/* &&     truncate -s0 /etc/machine-id &&     update-pciids" did not complete successfully: exit code: 1

网上追述:

959.9 [MIRROR] doca-sdk-telemetry-exporter-devel-2.8.0081-1.an8.aarch64.rpm: Curl error (18): Transferred a partial file for https://linux.mellanox.com/public/repo/doca/2.8.0/anolis8.6/arm64-dpu/doca-sdk-telemetry-exporter-devel-2.8.0081-1.an8.aarch64.rpm [transfer closed with 13164 bytes remaining to read]
959.9 [FAILED] doca-sdk-telemetry-exporter-devel-2.8.0081-1.an8.aarch64.rpm: No more mirrors to try - All mirrors were already tried without success

kubernotes 下载失败

816.4 (387/586): hwdata-0.314-8.22.0.1.an8.noarch.rpm 1.5 MB/s | 1.8 MB     00:01    
816.5 [MIRROR] kubelet-1.29.9-150500.1.1.aarch64.rpm: Curl error (18): Transferred a partial file for https://prod-cdn.packages.k8s.io/repositories/isv:/kubernetes:/core:/stable:/v1.29/rpm/aarch64/kubelet-1.29.9-150500.1.1.aarch64.rpm [transfer closed with 15234388 bytes remaining to read]
816.5 [FAILED] kubelet-1.29.9-150500.1.1.aarch64.rpm: No more mirrors to try - All mirrors were already tried without success
816.6 
816.6 The downloaded packages were saved in cache until the next successful transaction.
816.6 You can remove cached packages by executing 'dnf clean packages'.
816.9 Error: Error downloading packages:
816.9   kubelet-1.29.9-150500.1.1.aarch64: Cannot download, all mirrors were already tried without success
------
Dockerfile:24

解决办法更换阿里云的mirror
https://developer.aliyun.com/mirror/kubernetes?spm=a2c6h.13651102.0.0.73281b11UWPjI2

并且同步把docker也更换为阿里云的:

https://developer.aliyun.com/mirror/docker-ce?spm=a2c6h.13651102.0.0.57e31b11smKgIl

效果:
在这里插入图片描述
修改后多个repos文件修改为:

[root@localhost repos]# pwd
/root/workspace/bfb-build/anolis/8.6/repos
[root@localhost repos]# for f in `ls`; do echo ;echo \#==$f==; cat $f; done

#==AnolisOS-Experimental.repo==
[Experimental]
name=AnolisOS-$releasever - BaseOS
baseurl=http://mirrors.openanolis.cn/anolis/8.6/Experimental/$basearch/os
enabled=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ANOLIS
gpgcheck=1

#==doca.repo==
[doca] 
name=Nvidia DOCA repository 
baseurl=https://linux.mellanox.com/public/repo/doca/2.8.0/anolis8.6/arm64-dpu/ 
gpgcheck=0 
enabled=1

#==docker.repo==
[docker-ce-stable]
name=Docker CE Stable - $basearch
baseurl=https://mirrors.aliyun.com/docker-ce/linux/centos/8/$basearch/stable
enabled=1
gpgcheck=1
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值