KubeCon China 2021 回顾(下)

最新推荐文章于 2025-04-03 10:27:37 发布

转载最新推荐文章于 2025-04-03 10:27:37 发布 · 516 阅读

CC 4.0 BY-SA版权

原文链接：https://mp.weixin.qq.com/s?__biz=MzU1MzY4NzQ1OA==&mid=2247501105&idx=2&sn=35b645cc1cfab29c545ba3f1b08d50fb&chksm=fbed85fccc9a0cead8ee237fd136bde1fc0544e19f19a74b0551a7ccc42371f9d333d756129f&scene=126&&sessionid=0

文章标签：

#java #linux #分布式 #kubernetes #大数据

本文精选了一系列Kubernetes及云原生领域的技术分享，涵盖了CI/CD系统、文件系统性能提升、混沌工程实践、容器运行时调度等多个方面，并深入探讨了如Tekton、FUSE、ChaosMesh等工具的使用技巧。

这也是一篇长文。

Build a Large Scale Cloud Native CI/CD System Based on Tekton

原子能力非常优秀了。

基于 Tekton 做扩展。

cp-controller 支持了 sharding。

解决 Etcd 性能问题的方案就是不用 Etcd，改用 database。

Improve FUSE Filesystem Performance and Reliability

这个 Session 也很硬核。

FUSE device driver 会有额外的数据拷贝和上下文切换，是造成性能损耗的关键因素。

FUSE FD Passthrough 这块参考 PPT 中的链接一起看会好一点：https://lwn.net/Articles/843093/

通过 fuse_conn、fuse_dev 实现 failover。

通过 requeue 操作防止在 failover 时，FUSE server panic，造成正在操作的文件发生异常。

Chaos Mesh 2.0 Make Chaos Engineering Easy

终于有 Workflow 了。

那么，问题来了，刚执行了主机关机的 Chaos 实验，如何开机呢？

Run wasm applications on kubernetes edge cluster

K3s + Krustlet，有点非主流啊。

Deep Dive CRI-RM-based CPU and NUMA Affinity to Achieve AI Task Acceleration

不得不说 Intel 这个东西设计的可以的。上篇还说 Kubelet 应该出个插件机制来解决这种问题，Intel 就换另一个思路解决了。

“吵闹的邻居”经典问题。

还支持了傲腾。

在每个节点上做了小粒度的 CPU 池化，将 CPU 分成能共享和不能共享两类。

项目开源仓库地址：https://github.com/intel/cri-resource-manager

有个 PDF 可以参考：https://www.intel.cn/content/dam/www/central-libraries/cn/zh/documents/inspur-aistation-unlocks-compute-with-intel-cri-rm-for-cpu-affinity-scheduling.pdf

BFE Modern Layer 7 Load Balancer for Enterprise Application

BFE 支持了 Ingress。

控制平面也开源了：

https://github.com/bfenetworks/dashboard
https://github.com/bfenetworks/api-server
https://github.com/bfenetworks/conf-agent

BFE 没什么好说的，用就对了。

Best Practice DNS Failure Observability and Diagnosis in Kubernetes

都是常见问题，基本都遇到过。

可以配合官方文档食用：https://kubernetes.io/zh/docs/tasks/administer-cluster/dns-debugging-resolution/

服务端诊断：

dnstap 插件比较有用。

客户端诊断：

https://gist.github.com/xh4n3/61d8081b834d7e21bff723614e07777c

果然，也提到了 pwru。pwru 才是 YYDS。

pwru 刚开源没多久：https://github.com/cilium/pwru

Packet, Where are you 的视频讲解：https://www.youtube.com/watch?v=NhlR11Fp69g

More Secure and Confidential Computing on Linux with Nitro Enclaves

虽然也是打广告，但 Nitro Enclaves 这个东西还是挺有意思的。

连 SSH 都没有。

Exploring Cloud Native Big Data Platform in SPDB

一个新项目 Piraeus Datastore：https://github.com/piraeusdatastore/piraeus

DRBD 是老技术了，没想到在 K8s 场景下还能接着用起来。

DRBD 是一个基于软件的、shared-nothing、复制机制的存储解决方案，在主机之间镜像块设备(硬盘、分区、逻辑卷等)的内容。

real time，数据的修改会被实时同步
transparently，应用程序不需要感知数据存储在多个主机上
synchronously or asynchronously

DRBD 构成了一个虚拟块设备的驱动程序，DRBD 位于系统 I/O 堆栈的底部。DRBD 无法识别文件系统是否损坏。

Vivo's AI Computing Platform on Kubernetes

拼积木大合集，没有太多新意，但都非常落地，遇到了大家都可能会遇到的问题，也有一些通用的解决方案。

都是老生常谈的问题：

Ring Allreduce
任务调度顺序

KEDA 其实也问题多多，即使自己写插件也只能解决一部分问题。

VictoriaMetrics 才是 Prometheus 的最佳归宿。

VK 问题也很多。

DGL Operator Distributed Graph Neural Network Training with DGL and K8s

一个 Operator 从 0 到 1 的实现过程。

项目已开源：https://github.com/Qihoo360/dgl-operator

SuperEdge Promoting Kubernetes to the Edge of Technology Decryption

lite-apiserver 可以抽出来做通用的 apiserver 缓存层。

用 For 循环也能实现嘛。

云边通信还需要看一下 fabedge。

BPF Introduction, Programming Tips and Tricks

科普型 Session。

https://nakryiko.com/posts/libbpf-bootstrap/
https://github.com/libbpf/libbpf-bootstrap
https://github.com/iovisor/bcc/tree/master/libbpf-tools
https://nakryiko.com/posts/bcc-to-libbpf-howto-guide/
https://en.pingcap.com/blog/tips-and-tricks-for-writing-linux-bpf-applications-with-libbpf

BPF 看似很火，实则用不起来的原因：