新手攻略！手把手教你安装配置 Karpenter

原创已于 2025-02-26 11:22:34 修改 · 805 阅读

20 ·

CC 4.0 BY-SA版权

文章标签：

#kubernetes

于 2025-02-26 11:15:57 首次发布

Karpenter教程专栏收录该内容

9 篇文章

订阅专栏

起初，Karpenter 是专为 AWS 环境设计的 Kubernetes 集群节点扩展工具。随着开源社区的繁荣发展，Karpenter 对阿里云和 Azure 提供了支持。

针对阿里云的支持，由 CloudPilot AI 和阿里云容器服务和弹性计算团队联合开发贡献，详情请参阅：

CloudPilot AI携手阿里云发布Karpenter阿里云 Provider，优化ACK集群自动扩展

多年来，Kubernetes 节点的扩展解决方案一直是 Cluster Autoscaler（支持多个云厂商)。而在 AWS 上，这意味着通过调整 Auto Scaling Group 的实例数量来实现扩缩容，极大限制了扩缩容的速度。

这种方法存在一些问题：

1.Autoscale Group 的更改需要一定时间，因为 ASG 需要评估条件并调用 API 来启动或终止实例，所以当一个新的 Pod 因容量问题无法调度时，它必须等待 ASG 生效并启动一个新实例。

通常，从请求更新到实际启动 EC2 实例之间会有 30 到 60 秒的延迟。

2.默认情况下，在节点的资源使用率低于 50% 且持续时间超过 10 分钟时，Cluster Autoscaler 才将该节点评估为“低利用率”。这可能导致实例长时间处于利用率不足或闲置状态，从而增加开支。

3.Cluster Autoscaler 通常会为每个 Pod 请求一个节点，因此，如果有几个 Pod 请求容量，它就会启动几个节点来满足要求，这同样会造成利用率不足和支出问题。

4.在创建 ASG 时，会通过设置最大值来限制其规模，如果达到最大值，Cluster Autoscaler 就无法进一步增加规模，从而导致集群出现容量问题。

5.Cluster Autoscaler只能根据节点的规格来评估调度情况，因此 Autoscale Group 中的所有实例的 CPU 和内存必须相同，否则会干扰计算并影响 Cluster Autoscaler 的功能——这会导致我们的集群只能使用1 种节点类型（例如，4 核 CPU 和 32GB 内存），因为没有足够多符合该规格的的实例可供选择。

6.区域感知——Cluster Autoscaler 无法均匀地在多个可用区 (AZ) 启动实例，这通常会导致选择单个区域来部署 Spot 实例，并可能导致容量问题，我们通过为每个 AZ 创建 ASG 来解决这个问题。

Karpenter

Karpenter 是一款开源产品，可根据模板驱动实例创建（就像 ASG 根据模板驱动实例创建一样），但跳过了 ASG，从而实现了超快速的实例启动。节点几乎可以立即加入集群，但需要注意的是，节点准备就绪通常需要 60 秒左右。

01/Karpenter 如何工作？

Karpenter 会评估处于待调度状态的 Pod 数量，并立即将可以调度的 Pod 其调度到可用的容量。对于其余的 Pod，Karpenter 会根据用户提供的约束条件计算最佳的机器类型，并启动一台或几台能满足工作负载的机器。由于每台机器只需拉取一次镜像，因此 Pod 的启动速度会更快。

当 Karpenter 发现某个节点上没有运行任何 Pod 时，它就会缩小该 EC2 实例的规模。

02/Karpenter的优点

Karpenter 可以灵活组合不同规格 EC2 实例以适配工作负载，使 Spot 实例的利用效率和容错性大大提高。

Karpenter 的"consolidate"功能可主动检查节点利用率，并将工作负载整合到现有或新的节点中。

与 Cluster Autoscaler 相比，Karpenter 可以在一个节点上安装更多 Pod，从而节省成本。Karpenter 强制要求我们确保集群节点具备容错能力并能够恢复，这有助于提高集群的稳定性。

Karpenter 可以设置节点自动回收和更新最新 AMI 补丁的到期时间（TTL），从而减少故障影响范围。

Karpenter 对节点进行管理的方式是，当节点被删除时，它会首先将该节点标记为不可调度（Cordon），并停止向其调度更多的节点，然后将节点排空（Drain），将 Pod 迁移到其他节点上。

这种方式使得升级过程更加可控，并且如果操作得当（不是所有节点同时删除，并且设置了 Pod 中断预算），可以实现零停机时间。

节约成本

在 Karpenter 中节省的成本取决于您当前的工作负载和您在 Cluster Autoscaler 上使用的 EC2 实例类型。

下面是一个从R5.Large实例切换到一系列实例类型时每月成本节约的示例：

在这里插入图片描述

如何设置 Karpenter

您可以从官方 Helm Chart 中安装 Karpenter，但请注意，第一次安装可能会失败，因为 webhook 没有及时启动（这可能会在未来的版本中得到修复），在旧 Helm 的基础上重新安装一次即可正常工作。

如果在 Karpenter 安装上有困难，欢迎尝试 Karpenter 的托管云服务 CloudPilot AI（www.cloudpilot.ai），仅需5分钟即可完成安装。

接下来，为运行 Karpenter 创建专用节点。

Karpenter 自带用于指定 Provisioners 的自定义资源，这些 Provisioners 控制着要启动的实例类型和约束条件。

Provisioner 资源的示例：

apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
  name: default
spec:
  # Enables consolidation which attempts to reduce cluster cost by both removing un-needed nodes and down-sizing those
  # that can't be removed.  Mutually exclusive with the ttlSecondsAfterEmpty parameter.
  consolidation:
    enabled: true

  # If omitted, the feature is disabled and nodes will never expire.  If set to less time than it requires for a node
  # to become ready, the node may expire before any pods successfully start.
  ttlSecondsUntilExpired: 2592000 # 30 Days = 60 * 60 * 24 * 30 Seconds;

  # If omitted, the feature is disabled, nodes will never scale down due to low utilization
  ttlSecondsAfterEmpty: 30

  # Priority given to the provisioner when the scheduler considers which provisioner
  # to select. Higher weights indicate higher priority when comparing provisioners.
  # Specifying no weight is equivalent to specifying a weight of 0.
  weight: 10

  # Provisioned nodes will have these taints
  # Taints may prevent pods from scheduling if they are not tolerated by the pod.
  taints:
    - key: example.com/special-taint
      effect: NoSchedule


  # Provisioned nodes will have these taints, but pods do not need to tolerate these taints to be provisioned by this
  # provisioner. These taints are expected to be temporary and some other entity (e.g. a DaemonSet) is responsible for
  # removing the taint after it has finished initializing the node.
  startupTaints:
    - key: example.com/another-taint
      effect: NoSchedule

  # Labels are arbitrary key-values that are applied to all nodes
  labels:
    billing-team: my-team

  # Requirements that constrain the parameters of provisioned nodes.
  # These requirements are combined with pod.spec.affinity.nodeAffinity rules.
  # Operators { In, NotIn } are supported to enable including or excluding values
  requirements:
    - key: "karpenter.k8s.aws/instance-category"
      operator: In
      values: ["c", "m", "r"]
    - key: "karpenter.k8s.aws/instance-cpu"
      operator: In
      values: ["4", "8", "16", "32"]
    - key: karpenter.k8s.aws/instance-hypervisor
      operator: In
      values: ["nitro"]
    - key: "topology.kubernetes.io/zone"
      operator: In
      values: ["us-west-2a", "us-west-2b"]
    - key: "kubernetes.io/arch"
      operator: In
      values: ["arm64", "amd64"]
    - key: "karpenter.sh/capacity-type" # If not included, the webhook for the AWS cloud provider will default to on-demand
      operator: In
      values: ["spot", "on-demand"]

  # Karpenter provides the ability to specify a few additional Kubelet args.
  # These are all optional and provide support for additional customization and use cases.
  kubeletConfiguration:
    clusterDNS: ["10.0.1.100"]
    containerRuntime: containerd
    systemReserved:
      cpu: 100m
      memory: 100Mi
      ephemeral-storage: 1Gi
    kubeReserved:
      cpu: 200m
      memory: 100Mi
      ephemeral-storage: 3Gi
    evictionHard:
      memory.available: 5%
      nodefs.available: 10%
      nodefs.inodesFree: 10%
    evictionSoft:
      memory.available: 500Mi
      nodefs.available: 15%
      nodefs.inodesFree: 15%
    evictionSoftGracePeriod:
      memory.available: 1m
      nodefs.available: 1m30s
      nodefs.inodesFree: 2m
    evictionMaxPodGracePeriod: 3m
    podsPerCore: 2
    maxPods: 20

  # Resource limits constrain the total size of the cluster.
  # Limits prevent Karpenter from creating new instances once the limit is exceeded.
  limits:
    resources:
      cpu: "1000"
      memory: 1000Gi

  # References cloud provider-specific custom resource, see your cloud provider specific documentation
  providerRef:
    name: default

如你所见，你甚至可以调整 Kubelet 的配置。

避免使用 Karpenter 的自定义启动模板

Karpenter 强烈建议不要使用自定义启动模板（Launch Templates）。使用自定义启动模板会导致以下问题：

◻无法支持多架构。

◻无法自动升级节点。

◻无法发现安全组（SecurityGroup）

此外，使用启动模板可能会引起困惑，因为在 Karpenter 的 Provisioners 中，有些字段被重复定义，而有些字段则会被 Karpenter 忽略，例如子网和实例类型。

您通常可以通过使用自定义用户数据或直接在 AWS 节点模板中指定自定义 AMI 来避免使用启动模板。

自定义 UserData：

Karpenter 提供了一个名为 AWSNodeTemplate 的资源，允许您在节点启动时注入用户数据（UserData）：

apiVersion: karpenter.k8s.aws/v1alpha1
kind: AWSNodeTemplate
metadata:
 name: al2-example
spec:
 amiFamily: AL2
 instanceProfile: MyInstanceProfile
 subnetSelector:
   karpenter.sh/discovery: my-cluster
 securityGroupSelector:
   karpenter.sh/discovery: my-cluster
userData: |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="BOUNDARY"
--BOUNDARY
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
mkdir -p ~ec2-user/.ssh/
touch ~ec2-user/.ssh/authorized_keys
cat >> ~ec2-user/.ssh/authorized_keys <<EOF
{{ insertFile "../my-authorized_keys" | indent 4  }}
EOF
chmod -R go-w ~ec2-user/.ssh/authorized_keys
chown -R ec2-user ~ec2-user/.ssh
--BOUNDARY--

排除不适合您工作负载的实例类型

- key: node.kubernetes.io/instance-type
    operator: NotIn
    values:
      'm6g.16xlarge'
      'm6gd.16xlarge'
      'r6g.16xlarge'
      'r6gd.16xlarge'
      'c6g.16xlarge'

或者

spec:                                                                                       
  requirements:                                                
  - key: karpenter.sh/capacity-type                            
    operator: In                                               
    values:                                                    
    - spot                                                     
  - key: node.kubernetes.io/lifecycle                          
    operator: In                                               
    values:                                                    
    - spot                                                     
  - key: karpenter.k8s.aws/instance-memory                     
    operator: Gt                                               
    values:                                                    
    - "4096"                                                   
  - key: karpenter.k8s.aws/instance-memory                     
    operator: Lt                                               
    values:                                                    
    - "65536"                                                  
  - key: karpenter.k8s.aws/instance-cpu                        
    operator: Gt                                               
    values:                                                    
    - "2"                                                      
  - key: karpenter.k8s.aws/instance-cpu                        
    operator: Lt                                               
    values:                                                    
    - "32"                                                     
  - key: karpenter.k8s.aws/instance-family                     
    operator: In                                               
    values:                                                    
    - c4                                                       
    - c5                                                       
    - c6                                                       
    - r5                                                       
    - r6i                                                      
  - key: kubernetes.io/arch                                    
    operator: In                                               
    values:                                                    
    - amd64

03/Karpenter 注意事项与建议

1.Karpenter 启动时默认使用容器运行时containerd，因此，如果您运行的是 Docker 命令（例如 Jenkins 等），或者您有类似 localhost 的 DNS 主机名，这些主机名可能会解析为 IPv6 地址::1，从而导致您的设置出现问题。

您可以通过配置 Provisioner，使其以DockerD启动：

spec:                                 
   kubeletConfiguration:        
     containerRuntime: dockerd

2.确保可以零停机终止 Pod，检查您的 preStopHooks，确保它们正确配置，以便在 Pod 终止前完成必要的清理操作。避免将终止进程分叉到新的 subshell 中运行，因为这可能导致终止流程的延迟或失败，从而影响服务的平稳过渡。

3.Karpenter 是一个非常快速的启动器，这意味着如果您在超过 1000 个 Pod 的规模下进行测试，可能会出现以下问题：

◻Kubernetes API 出现瓶颈或过载。

◻Docker 拉取速率受限，因为默认的镜像仓库拉取速率限制为每秒 5 次。

您可以通过以下设置调整 kubeletConfiguration 部分：

"registryPullQPS": 500,
"registryBurst": 100,
"kubeAPIQPS": 200,
"kubeAPIBurst": 100

4.确保您不使用hostname topology，而是采用zone topology spread constrinats

5.监控并调整 Pod 对 CPU 和 MEM 的请求，使其达到实际运行水平，并留出足够的余量以防止 OOM 杀死进程。

6.确保为您的 Pod 配置了 PDB（Pod 中断预算），这样，当 Karpenter 清空节点时，它会遵守服务的 Pod 可用性，确保不会导致停机。

7.启用 Karpenter 的指标监控，并设置警报，如果 Karpenter 的就绪节点数少于其 ASG 中的节点数，则触发发出警报。

8.通过添加 finalizer 来保护您的 Karpenter 节点免受意外删除。

（在本例中，我的节点标签注为 core，请根据实际使用的标签进行更改）

kubectl get nodes -l core=true - no-headers | awk {'print $1'} | xargs kubectl patch node -p '{"metadata":{"finalizers":["kubernetes"]}}' - type=merge

9.关注⌈ CloudPilot AI ⌋，获取 Karpenter 在 EKS 上的最佳实践指南。

劲省85%云成本！在K8s上使用Karpenter私有部署DeepSeek-R1

机器学习推理成本减少45%！Ray+Karpenter 在科技初创公司的落地实践