K8S_pod

本文详细解析了 Kubernetes 中 Pod 的配置文件,包括 API 版本、Pod 类型、元数据、容器名称、镜像及命令等关键信息,为理解和部署 Pod 提供了深入的指导。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

apiVersion: v1
kind: Pod
metadata:
  name: pod-demo
  namespace: default
  labels:
    app: myapp
    tire: frontend
spec:
  containers:
  - name: myapp
    image: nginx
  - name: busybox
    image: busybox
    command:
    - "/bin/sh"
    - "-c"
    - " sleep 500"

以下是基于VictoriaMetrics配置Kubernetes相关告警的示例规则及模板说明: --- ### 1. **k8s_节点状态_NotReady** 当某个 Kubernetes 节点的状态变为 `NotReady` 时触发。 **PromQL 表达式:** ```promql kube_node_status_condition{condition="Ready", status!="true"} ``` **告警规则 (Alerting Rule) 配置:** ```yaml alert: NodeNotReady expr: kube_node_status_condition{condition="Ready", status!="true"} == 1 for: 5m labels: severity: critical annotations: summary: "Node {{ $labels.node }} is not ready" description: "The node '{{ $labels.node }}' has been in NotReady state for more than 5 minutes." ``` --- ### 2. **k8s_节点内存可用量_低于500MB** 当某节点上剩余内存量小于 500 MB 时触发。 **PromQL 表达式:** ```promql (node_memory_MemAvailable_bytes / 1e6) < 500 ``` **告警规则配置:** ```yaml alert: LowMemoryOnNode expr: (node_memory_MemAvailable_bytes / 1e6) < 500 for: 3m labels: severity: warning annotations: summary: "Low memory on node {{ $labels.instance }}" description: "The available memory on the node '{{ $labels.instance }}' is less than 500MB." ``` --- ### 3. **k8s_网络插件_Calico_状态_异常** 检测 Calico 网络组件是否处于非健康状态。 **PromQL 表达式:** ```promql calico_felix_healthz_failures > 0 ``` **告警规则配置:** ```yaml alert: CalicoHealthCheckFailed expr: calico_felix_healthz_failures > 0 for: 2m labels: severity: critical annotations: summary: "Calico health check failed on host {{ $labels.host }}" description: "The Calico network plugin on host '{{ $labels.host }}' reported a failure during its health checks." ``` --- ### 4. **k8s_Pod_重启次数_超过5次/小时** 当某个 Pod 的容器在一小时内重启超过 5 次时触发。 **PromQL 表达式:** ```promql sum(rate(kube_pod_container_restarts_total[1h])) by (pod) > 5 ``` **告警规则配置:** ```yaml alert: HighPodRestarts expr: sum(rate(kube_pod_container_restarts_total[1h])) by (pod) > 5 for: 1m labels: severity: critical annotations: summary: "{{ $labels.pod }} pod restarted too many times" description: "The pod '{{ $labels.pod }}' has restarted over 5 times within an hour." ``` --- ### 5. **k8s_node_网络不可用** 检查是否存在节点无法访问外部网络的情况。 **PromQL 表达式:** ```promql probe_success{job="blackbox-exporter", probe="network-connectivity"} != 1 ``` **告警规则配置:** ```yaml alert: NetworkUnreachableFromNode expr: probe_success{job="blackbox-exporter", probe="network-connectivity"} != 1 for: 5m labels: severity: high annotations: summary: "Network unreachable from node {{ $labels.instance }}" description: "The node '{{ $labels.instance }}' cannot access external networks via blackbox monitoring probes." ``` --- ### 6. **k8s_kube-scheduler_组件_挂起** 检测 Kube-Scheduler 是否正常运行。 **PromQL 表达式:** ```promql absent(up{job="kube-scheduler"}) or up{job="kube-scheduler"} == 0 ``` **告警规则配置:** ```yaml alert: SchedulerDown expr: absent(up{job="kube-scheduler"}) or up{job="kube-scheduler"} == 0 for: 3m labels: severity: critical annotations: summary: "Kube-Scheduler component down" description: "The Kubernetes scheduler service seems to be unavailable or unresponsive." ``` --- ### 7. **k8s_kube-controller-manager_组件_挂起** 检测 Controller Manager 组件是否故障。 **PromQL 表达式:** ```promql absent(up{job="kube-controller-manager"}) or up{job="kube-controller-manager"} == 0 ``` **告警规则配置:** ```yaml alert: ControllerManagerDown expr: absent(up{job="kube-controller-manager"}) or up{job="kube-controller-manager"} == 0 for: 3m labels: severity: critical annotations: summary: "Kube-Controller-Manager component down" description: "The Kubernetes controller manager service seems to be unavailable or unresponsive." ``` --- ### 8. **k8s_kublet_组件_挂起** 检测 Kublet 运行状态。 **PromQL 表达式:** ```promql up{job="kubelet"} == 0 ``` **告警规则配置:** ```yaml alert: KubeletDown expr: up{job="kubelet"} == 0 for: 3m labels: severity: critical annotations: summary: "Kubelet agent down on node {{ $labels.node }}" description: "The Kubelet agent responsible for managing containers and pods on node '{{ $labels.node }}' appears to have stopped running." ``` --- ### 9. **k8s_etcd_实例_挂起** 监控 etcd 实例的运行状况。 **PromQL 表达式:** ```promql etcd_server_has_leader == 0 ``` **告警规则配置:** ```yaml alert: EtcdNoLeader expr: etcd_server_has_leader == 0 for: 2m labels: severity: emergency annotations: summary: "ETCD cluster lacks leader" description: "The ETCD distributed key-value store does not currently have an elected leader, which may impact overall cluster stability." ``` --- ###
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值