
k8s
vah101
这个作者很懒,什么都没留下…
展开
专栏收录文章
- 默认排序
- 最新发布
- 最早发布
- 最多阅读
- 最少阅读
-
head-less和hostport例子
head-less请求的例子如下:apiVersion: apps/v1kind: StatefulSetmetadata: labels: app: swm name: swmspec: replicas: 2 selector: matchLabels: app: swm serviceName: swm-headless template: metadata: labels: app: swm原创 2021-08-06 17:44:52 · 147 阅读 · 0 评论 -
centos8版本操作系统安装rancher注意事项
centos8版本(linux内核版本高于3.13)采用了nftables而不是iptables做数据包的转发,所以使用默认的iptables方式可能会出现错误。需要修改集群配置,修改集群配置文件为如下形势: kubeproxy: extra_args: # 默认使用iptables进行数据转发 proxy-mode: "ipvs" # 如果要启用ipvs,则此处设置为`ipvs`原创 2021-07-28 14:58:53 · 834 阅读 · 0 评论 -
calico报错:bird: Netlink: File exists
原因如下:A problem in BIRD-Linux kernel routing table synchronization when BIRD tries to overwrite an existing kernel route. There are two common causes:First, there are some routes in the kernel routing table added by some other tools (like ip or route co原创 2021-07-26 17:14:08 · 1305 阅读 · 0 评论 -
kvm操作
virsh list 显示正在运行的虚拟机virsh list --all显示所有的虚拟机,包括已经关闭的virsh dumpxml xxx > ./vm.xml 将虚拟机描述文件导出virsh define vm.xml 导入虚拟机描述文件virsh start xxx 启动虚拟机xxxvirsh destroy xxx 关闭虚拟机xxx...原创 2021-07-13 11:30:37 · 140 阅读 · 0 评论 -
harbor开机自启动配置
cat >> /usr/lib/systemd/system/harbor.service <<EOF[Unit]Description=HarborAfter=docker.service systemd-networkd.service systemd-resolved.serviceRequires=docker.serviceDocumentation=http://github.com/vmware/harbor[Service]Type=simpleR.原创 2021-07-09 15:14:21 · 1820 阅读 · 0 评论 -
ingress的转发规则配置例子
apiVersion: extensions/v1beta1kind: Ingressmetadata: annotations: nginx.ingress.kubernetes.io/cors-allow-methods: '*' nginx.ingress.kubernetes.io/cors-allow-origin: '*' nginx.ingress.kubernetes.io/enable-cors: "true" nginx.ingress.kube.原创 2021-07-04 20:09:54 · 1881 阅读 · 0 评论 -
calico、keepalived报错解决
calico日志中间歇性输出如下内容:2021-07-03 10:02:15.879 [INFO][100] iface_monitor.go 176: Netlink address update. addr="192.168.1.150" exists=false ifIndex=92021-07-03 10:02:15.879 [INFO][100] int_dataplane.go 622: Linux interface addrs changed. addrs=set.mapSet{"1原创 2021-07-04 18:32:04 · 431 阅读 · 0 评论 -
ignite测试
连接ignite数据库./sqlline.sh --verbose=true -u jdbc:ignite:thin://10.42.246.146/创建表:CREATE TABLE PUBLIC.Person (id INTEGER,NAME VARCHAR,PRIMARY KEY (id));生成测试sql脚本for ((i=1; i<=100000; i ++)) do echo "insert into PUBLIC.Person (id, NAME) values($...原创 2021-06-24 17:58:21 · 366 阅读 · 0 评论 -
k8s上部署apach-ignite集群
编辑ignite.yaml文件如下:apiVersion: v1kind: Servicemetadata: # Name of Ignite Service used by Kubernetes IP finder. # The name must be equal to TcpDiscoveryKubernetesIpFinder.serviceName. name: ignite namespace: defaultspec: clusterIP: None # cu原创 2021-06-23 21:15:17 · 561 阅读 · 0 评论 -
virtctl上传操作系统镜像到cdi的持久化pvc存储中
1. 获取操作系统镜像:从https://cloud.centos.org/centos/7/images/下载qcow2格式的操作系统镜像2. 执行如下操作:kubectl -n cdi get svc -l cdi.kubevirt.io=cdi-uploadproxy获取cdi-uploadproxy的ip地址,如10.43.138.413. 编写pv.yaml内容如下:apiVersion: v1kind: PersistentVolumemetadata: n.原创 2021-04-01 15:04:15 · 1709 阅读 · 0 评论 -
rancher删除Terminaling的命名空间
首先要安装jq:wget http://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpmrpm -ivh epel-release-latest-7.noarch.rpmyum install jq设置要删除命名空间的名称:NAMESPACE=test接着执行如下操作:RANCHER_SERVER_URL=$( kubectl config view -o json|jq -r .clusters[0原创 2021-03-24 20:30:21 · 554 阅读 · 0 评论 -
使用rancher重建集群报错
将原有集群删除后,重新创建集群,rancher集群报错:Cluster health check failed: Failed to communicate with API server: Get "https://192.168.200.10:6443/api/v1/namespaces/kube-system?timeout=45s": context deadline exceeded 检查6443对应api-server的docker容器的日志:docker logs -f kub原创 2021-03-15 11:49:40 · 10856 阅读 · 0 评论 -
tensorflow1.13.1+ Anaconda3.5.1+cuda10+cudnn7 docker镜像打包
FROM nvidia/cuda:10.0-cudnn7-devel-centos7# install basic dependenciesRUN yum install vim wget cmake bzip2 -y# install Anaconda3RUN wget https://mirrors.tuna.tsinghua.edu.cn/anaconda/archive/Anaconda3-5.3.1-Linux-x86_64.sh -O ~/Anaconda3-5.3.1-L...原创 2021-02-23 17:23:53 · 316 阅读 · 0 评论 -
kubernetes部署greenplum
http://greenplum-kubernetes.docs.pivotal.io/2-3/index.html部署监控程序GPCChttps://greenplum-kubernetes.docs.pivotal.io/2-3/gpcc.html原创 2021-01-18 11:03:50 · 1069 阅读 · 0 评论 -
rancher的calico报错Invalidating dataplane cache ipVersion=0x4 reason=“chain update“ table=“filter“
calico提示:Invalidating dataplane cache ipVersion=0x4 reason="chain update" table="filter"解决方法,修改calico对应yml,修改readness和liveness对应的内容为: livenessProbe: failureThreshold: 6 httpGet: host: localhost pat原创 2021-01-08 18:38:41 · 523 阅读 · 0 评论 -
calico报错Calico requires net.ipv4.conf.all.rp_filter to be set to 0 or 1
calico报错: int_dataplane.go 1018: Kernel's RPF check is set to 'loose'. This would allow endpoints to spoof their IP address. Calico requires net.ipv4.conf.all.rp_filter to be set to 0 or 1. If you require loose RPF and you are not concerned about spoofi原创 2020-12-18 18:00:42 · 735 阅读 · 2 评论 -
calico多网口配置
对于多个网口的服务器,calico需要指定网口,可以在calico.yaml的containers->env中,加入环境变量IP_AUTODETECTION_METHOD: containers: - env: - name: IP_AUTODETECTION_METHOD value: interface=bond*,eth*其中interface可以使用通配符,并且可以指定多种类型的前缀...原创 2020-11-27 15:52:48 · 2210 阅读 · 0 评论 -
rancher界面无法登陆问题解决
1. 表现是rancher界面无法登陆,通过docker logs -f rancher发现日志中主要是etcd在报错,包含如下信息:etcdserver: mvcc: database space exceeded2. 确定是由于etcd写满限额,导致的问题。需要登录到etcd所在的结点执行:docker exec -ti etcd sh进入到etcd的容器,再执行,注意需要进入到每个etcd结点依次执行:#查看etcd的告警信息etcdctl alarm list#获得版原创 2020-11-19 17:55:49 · 6307 阅读 · 0 评论 -
kubevirt使用持久化方式部署虚拟机
https://blog.youkuaiyun.com/vah101/article/details/109393495介绍了部署kubevirt实现基于k8s的kvm虚拟化,但是这里提到的kvm虚机并没实现持久化。kubevirt提供了一种持久化方式Containerized Data Importer (CDI)。其构建过程为:wget https://raw.githubusercontent.com/kubevirt/kubevirt.github.io/master/labs/manifes...原创 2020-11-05 13:48:45 · 892 阅读 · 0 评论 -
kubevirt部署
1. 安装相关的镜像,执行如下命令export VERSION=$(curl -s https://api.github.com/repos/kubevirt/kubevirt/releases | grep tag_name | grep -v -- '-rc' | head -1 | awk -F': ' '{print $2}' | sed 's/,//' | xargs)echo $VERSIONkubectl create -f https://github.com/kubevirt/k原创 2020-10-30 21:46:37 · 1234 阅读 · 2 评论 -
rancher中使用virtctl console报错Can‘t connect to websocket (404): websocket: bad handshake的解决方法
按https://blog.youkuaiyun.com/vah101/article/details/108828526操作,更换config文件后,再执行virtctl console就正常了原创 2020-10-30 21:06:40 · 2287 阅读 · 0 评论 -
解决rancher下创建证书approve后没有被自动issue
在rancher下创建证书后,使用kubectl describe csr xxx查询状态为pending,之后执行kubectl certificate approve xxx将证书批准,再检查状态,发现变为Approved,但是执行:kubectl get csr xxx -o jsonpath='{.status.certificate}'没有输出结果,再次执行kubectl describe csr xxx仔细观察,发现status只是Approved,正常情况下,应该为Ap原创 2020-11-27 18:28:18 · 1326 阅读 · 1 评论 -
k8s创建Deployment报错:missing required field “selector“ in io.k8s.api.apps.v1.DeploymentSpec
https://blog.youkuaiyun.com/cd_yourheart/article/details/107463956转载 2020-10-26 15:34:42 · 4133 阅读 · 0 评论 -
k8s安装tidb
k8s安装tidb如果是用Rancher管理的集群,需要修改集群配置挂载分区启动local-volume-provisioner创建crd器安装helm安装tidb-operator安装tidb-cluster获取tidb的nodePort连接tidbEnd# tidb-operater要求集群中最少有3个结点如果是用Rancher管理的集群,需要修改集群配置需要到集群->升级->编辑yaml下,找到“ services:”,在其中为kubelet增加参数,如下: kubelet:原创 2020-10-17 20:33:10 · 848 阅读 · 0 评论 -
docker中运行的pytorch解决多线程报错
在docker中运行的pytorch运行多线程训练,报如下错误:unexpected bus error encountered in worker. This might be caused by insufficient shared memory(shm)解决方法,启动docker时,增加--ipc=host参数如果是在k8s上运行的,则需要在yaml的spec中加入hostIPC: true类似如下效果:apiVersion: v1kind: podmetadata:.原创 2020-10-13 16:11:32 · 813 阅读 · 0 评论 -
rancher运行pvc程序报错解决方法
rancher运行需要配置pv、pvc的应用如下报错:Warning FailedMount 2s (x6 over 17s) kubelet, node1 MountVolume.NewMounter initialization failed for volume "local-pv-cadb07da" : path "/mnt/ssd/t1" does not exist需要到集群->升级->编辑yaml下,找到“services:”,在其中为kubelet增加参...原创 2020-10-11 01:22:08 · 1876 阅读 · 0 评论 -
pytorch和tensorflow查看GPU信息命令
pytorch查看GPU信息import torchtorch.cuda.is_available()#cuda是否可用;torch.cuda.device_count()#返回gpu数量;torch.cuda.get_device_name(0)#返回gpu名字,设备索引默认从0开始;torch.cuda.current_device()tensorflow查看GPU信息:import tensorflow as tffrom tensorflow.python.原创 2020-09-21 21:07:54 · 813 阅读 · 0 评论 -
docker服务清理空间
清除掉已经停止的docker容器docker rm `docker ps -a|grep 'Exited'|awk {'print $1'}`清除掉已经被代替的docker镜像docker rmi `docker images|grep '<none>'|awk {'print $3'}`原创 2020-09-15 16:23:06 · 152 阅读 · 0 评论 -
华为海思arm64架构泰山服务器安装docker的方法
参考:https://support.huaweicloud.com/instg-kunpengcpfs/kunpengcpfs_03_0001.html1. 到这里下载2进制的docker可执行文件https://download.docker.com/linux/static/stable/aarch64/2. 解压缩tar xvpf docker-19.03.8.tgz cp -p docker/* /usr/binsetenforce 0systemctl st.原创 2020-08-28 17:40:24 · 1254 阅读 · 0 评论 -
rancher报错Failed to watch directory “/sys/fs/cgroup/blkio/system.slice“: inotify_add_watch /sys/fs/cg
rancher界面报错:[workerPlane] Failed to bring up Worker Plane: [Failed to verify healthcheck: Failed to check http://localhost:10248/healthz for service [kubelet] on host [192.168.0.101]: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: co原创 2020-08-28 13:07:57 · 1745 阅读 · 0 评论 -
[debug] error converting YAML to JSON: yaml: line 42: did not find expected ‘:‘ indicator
使用kubectl creat -f执行一个yaml脚本的过程中报错:[debug] error converting YAML to JSON: yaml: line 42: did not find expected ':' indicator后来仔细检查,错误原因是不同段落之前的分隔符本应该是"---"(三个减号),少写了一个,只有两个。将分隔符补齐,报错提示消失。...原创 2020-08-25 16:26:56 · 7938 阅读 · 0 评论 -
kubernetes上启动tensorflow:latest-gpu-jupyter
编写如下的tensflow.yaml文件apiVersion: apps/v1kind: Deploymentmetadata: name: tensorflow-gpu-jupyter labels: app: tensorflow-gpu-jupyterspec: replicas: 1 selector: # define how the deployment finds the pods it mangages matchLabels: app原创 2020-08-25 16:05:03 · 542 阅读 · 0 评论 -
rancher部署gpushare-scheduler-extender
gpushare-scheduler-extender是阿里云在kubernetes平台上开发的针对GPU进行虚拟化的方案,首先,参考https://blog.youkuaiyun.com/vah101/article/details/108098827,安装k8s-deviece-plugin,并将/etc/docker/daemon.json配置为:{ "default-runtime": "nvidia", "runtimes": { "nvidia": {原创 2020-11-06 11:30:42 · 1660 阅读 · 1 评论 -
rancher/coreos-etcd:v3.3.15编译arm64版本
首先,下载etcd代码mkdir -p $GOPATH/src/github.com/coreoscd $GOPATH/src/github.com/coreosgit clone https://github.com/etcd-io/etcd.gitcd etcdgit chechout v3.3.15make build如果在arm64环境下,出现如下结果,说明成功2020-08-20 18:05:34.183276 E | etcdmain: etcd on unsuppo原创 2020-08-20 18:09:01 · 837 阅读 · 0 评论 -
k8s集群中GPU结点的配置
1.在GPU服务器上安装cuda程序及驱动执行:lspci|grep-invidia确定是否存在GPU,如果提示lspci命令不存在,则执行yuminstallpciutils-y2.安装NVIDIA、epel的rpm仓库执行:wgethttps://developer.download.nvidia.cn/compute/cuda/repos/rhel7/x86_64/cuda-repo-rhel7-10.1.243-1.x86_6...原创 2020-08-25 16:32:05 · 1408 阅读 · 0 评论 -
单容器rancher证书过期解决
2.2版本的rancher证书过期,docker会频繁重启报错:08:51:46.160121 I | http: TLS handshake error from 127.0.0.1:33140: remote error: tls: bad certificateE0814 08:51:46.160212 6 reflector.go:134] k8s.io/client-go/informers/factory.go:127: Failed to list *v1.Replicat原创 2020-08-14 17:19:09 · 4975 阅读 · 0 评论 -
kubelet.go:1344] Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/devices: no space left on
kubelet容器频繁启动,日志报错:kubelet.go:1344] Failed to start cAdvisor inotify_add_watch /sys/fs/cgroup/devices: no space left on device解决方法,在宿主机执行:sudo sysctl fs.inotify.max_user_watches=1048576原创 2020-08-10 16:14:37 · 1350 阅读 · 0 评论 -
arm64环境编译harbor
1. 安装docker-compose参考https://www.huaweicloud.com/kunpeng/software/dockercompose.html最后,将docker-compose-Linux-aarch64拷贝到/usr/bin目录下2. 在/etc/profile中加入如下内容export GOPATH="/home/xxx/go"export P...原创 2020-08-25 16:18:58 · 1867 阅读 · 0 评论 -
no matches for kind “DaemonSet“ in version “extensions/v1beta1“
DaemonSet、Deployment、StatefulSet 和 ReplicaSet 在 v1.16 中将不再从 extensions/v1beta1、apps/v1beta1 或 apps/v1beta2 提供服务解决方法是:将yml配置文件内的api接口修改为 apps/v1 ,导致原因为之间使用的kubernetes 版本是1.14.x版本,1.16.x 版本放弃部分API支持...原创 2020-08-02 19:13:17 · 9641 阅读 · 0 评论 -
arm64平台gem install zookeeper报错解决
在arm64平台下,执行:gem install zookeeper报错:configure: error: cannot guess build type; you must specify one说明,/var/lib/gems/2.3.0/gems/zookeeper-1.4.11/ext/zkc-3.4.5/c/config.sub文件已经过时,没有包含当前arm64...原创 2020-04-22 15:04:39 · 1978 阅读 · 0 评论