K8S集群部署kube-Prometheus监控Mysql 5.7
一、前言
测试环境使用的mysql单机版,部署在K8S集群内,现在需要部署mysql-exporter暴露metrics,有两种方式:
①、在K8S集群外部署mysql-exporter
②、在K8S集群内部署mysql-exporter。
本文采用第二种方式。
注:为管理方便单独为kube-prometheus 下集群外服务监控创建了一个namespace
[root@k8s01 prometheus-mysql]# kubectl create ns prometheus-exporter
[root@k8s01 prometheus-mysql]# kubectl get namespace
NAME STATUS AGE
cephfs Active 475d
default Active 486d
kube-node-lease Active 486d
kube-public Active 486d
kube-system Active 486d
kubernetes-dashboard Active 485d
monitoring Active 485d
prometheus-exporter Active 2d16h
二、Mysql 授权mysql-exporter 收集数据
①、mariadb版本
mariadb> select version();
+---------------------------------------+
| version() |
+---------------------------------------+
| 10.4.18-MariaDB-1:10.4.18+maria~focal |
+---------------------------------------+
1 row in set (0.04 sec)
②、创建监控用户并授权
mariadb> CREATE USER 'mysqlexporter'@"%" IDENTIFIED BY 'mysqlexporter';
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'mysqlexporter'@'%' IDENTIFIED BY 'mysqlexporter' WITH MAX_USER_CONNECTIONS 30;
GRANT select on performance_schema.* to "mysqlexporter"@"%" IDENTIFIED BY 'mysqlexporter';
flush privileges;
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
Query OK, 0 rows affected (0.01 sec)
三、k8s部署
①、因为这里做测试所以使用Nodeport方式暴露metrices!
[root@k8s01 prometheus-mysql]# cat demployment-mysql.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
namespace: prometheus-exporter
name: mysqld-exporter
labels:
app: mysqld-exporter
spec:
selector:
matchLabels:
app: mysqld-exporter
template:
metadata:
labels:
app: mysqld-exporter
spec:
containers:
- name: mysqld-exporter
image: prom/mysqld-exporter
env:
- name: DATA_SOURCE_NAME
value: "mysqlexporter:mysqlexporter@(172.16.1.11:30006)/" #数据库连接: 用户:密码@(主机:端口)
ports:
- containerPort: 9104
name: http
---
apiVersion: v1
kind: Service
metadata:
namespace: prometheus-exporter
labels:
app: mysqld-exporter
name: mysqld-exporter
spec:
type: NodePort
ports:
- name: http
port: 9104
nodePort: 30043
targetPort: http
selector:
app: mysqld-exporter
②、查看mysql-exporter暴露出来的指标
查看pod 是否部署成功!
[root@k8s01 prometheus-mysql]# kubectl get pod -o wide -n prometheus-exporter
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mysqld-exporter-58477844b4-rvj7w 1/1 Running 0 15s 172.30.77.2 k8s04 <none> <none>
[root@k8s01 prometheus-mysql]# kubectl get svc -n prometheus-exporter
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
mysqld-exporter NodePort 10.254.228.5 <none> 9104:30043/TCP 34s
详细指标
③、部署prometheus-serviceMonitormysql.yaml 监控服务
匹配 namespace 名称: prometheus-exporter
[root@k8s01 prometheus-mysql]# cat prometheus-serviceMonitormysql.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: mysqld-exporter
namespace: monitoring
labels:
app: mysqld-exporter
spec:
jobLabel: mysqld-exporter
endpoints:
- port: http
interval: 15s
selector:
matchLabels:
app: mysqld-exporter
namespaceSelector:
matchNames:
- prometheus-exporter
[root@k8s01 prometheus-mysql]# kubectl apply -f prometheus-serviceMonitormysql.yaml
servicemonitor.monitoring.coreos.com/mysqld-exporter created
④、K8S rabc 授权 prometheus-k8s 可以访问prometheus-exporter 名称空间下的pod
[root@k8s01 prometheus-mysql]# prometheus-roleBindNewNameSpace.yaml
--- # 在对应的ns中创建角色
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
name: prometheus-k8s
namespace: my-namespace
rules:
- apiGroups:
- ""
resources:
- services
- endpoints
- pods
verbs:
- get
- list
- watch
--- # 绑定角色 prometheus-k8s 角色到 Role
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: prometheus-k8s
namespace: my-namespace
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: prometheus-k8s
subjects:
- kind: ServiceAccount
name: prometheus-k8s # Prometheus 容器使用的 serviceAccount,kube-prometheus默认使用prometheus-k8s这个用户
namespace: monitoring
[root@k8s01 prometheus-mysql]# kubectl apply -f prometheus-roleBindNewNameSpace.yaml
role.rbac.authorization.k8s.io/prometheus-k8s created
注:my-namespace 替换为自己的namespace。
可以看到有很多错误日志出现,都是xxx is forbidden,这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole:(prometheus-clusterRole.yaml)
[root@k8s01 prometheus-mysql]# kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
level=info ts=2021-11-15T02:58:32.254Z caller=kubernetes.go:253 component="discovery manager notify" discovery=k8s msg="Using pod service account via in-cluster config"
level=error ts=2021-11-15T02:58:32.256Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:361: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"endpoints\" in API group \"\" in the namespace \"prometheus-exporter\""
level=error ts=2021-11-15T02:58:32.256Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:362: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"services\" in API group \"\" in the namespace \"prometheus-exporter\""
level=info ts=2021-11-15T02:58:32.258Z caller=main.go:827 msg="Completed loading of configuration file" filename=/etc/prometheus/config_out/prometheus.env.yaml
level=error ts=2021-11-15T02:58:32.262Z caller=klog.go:94 component=k8s_client_runtime func=ErrorDepth msg="/app/discovery/kubernetes/kubernetes.go:363: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" in the namespace \"prometheus-exporter\""
查看原有的cluster 文件
[root@k8s-master manifests]# cat prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
上面的权限规则中我们可以看到明显没有对 Service 或者 Pod 的 list 权限,所以报错了,要解决这个问题,我们只需要添加上需要的权限即可:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
重新应用prometheus-clusterRole.yaml 文件
[root@k8s01 manifests]# kubectl apply -f prometheus-clusterRole.yaml
登录prometheus dashboard 查看监控target mysql已经读取到监控信息了。
四、granfa
导入 7362 模板
监控效果图如下:
五、mysql 告警指标
- alert: Mysql_Instance_Reboot
expr: mysql_global_status_uptime < 180
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_Instance_Reboot detected"
description: "{{$labels.instance}}: Mysql_Instance_Reboot in 3 minute (up to now is: {{ $value }} seconds"
- alert: Mysql_High_QPS
expr: rate(mysql_global_status_questions[5m]) > 500
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_High_QPS detected"
description: "{{$labels.instance}}: Mysql opreation is more than 500 per second ,(current value is: {{ $value }})"
- alert: Mysql_Too_Many_Connections
expr: rate(mysql_global_status_connections[5m]) > 100
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql Too Many Connections detected"
description: "{{$labels.instance}}: Mysql Connections is more than 100 per second ,(current value is: {{ $value }})"
- alert: Mysql_High_Recv_Rate
expr: rate(mysql_global_status_bytes_received[3m]) * 1024 * 1024 * 8 > 100
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_High_Recv_Rate detected"
description: "{{$labels.instance}}: Mysql_Receive_Rate is more than 100Mbps ,(current value is: {{ $value }})"
- alert: Mysql_High_Send_Rate
expr: rate(mysql_global_status_bytes_sent[3m]) * 1024 * 1024 * 8 > 100
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_High_Send_Rate detected"
description: "{{$labels.instance}}: Mysql data Send Rate is more than 100Mbps ,(current value is: {{ $value }})"
- alert: Mysql_Too_Many_Slow_Query
expr: rate(mysql_global_status_slow_queries[30m]) > 3
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_Too_Many_Slow_Query detected"
description: "{{$labels.instance}}: Mysql current Slow_Query Sql is more than 3 ,(current value is: {{ $value }})"
- alert: Mysql_Deadlock
expr: mysql_global_status_innodb_deadlocks > 0
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_Deadlock detected"
description: "{{$labels.instance}}: Mysql Deadlock was found ,(current value is: {{ $value }})"
- alert: Mysql_Too_Many_sleep_threads
expr: mysql_global_status_threads_running / mysql_global_status_threads_connected * 100 < 30
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_Too_Many_sleep_threads detected"
description: "{{$labels.instance}}: Mysql_sleep_threads percent is more than {{ $value }}, please clean the sleeping threads"
- alert: Mysql_innodb_Cache_insufficient
expr: (mysql_global_status_innodb_page_size * on (instance) mysql_global_status_buffer_pool_pages{state="data"} + on (instance) mysql_global_variables_innodb_log_buffer_size + on (instance) mysql_global_variables_innodb_additional_mem_pool_size + on (instance) mysql_global_status_innodb_mem_dictionary + on (instance) mysql_global_variables_key_buffer_size + on (instance) mysql_global_variables_query_cache_size + on (instance) mysql_global_status_innodb_mem_adaptive_hash ) / on (instance) mysql_global_variables_innodb_buffer_pool_size * 100 > 80
for: 2m
labels:
severity: warning
annotations:
summary: "{{$labels.instance}}: Mysql_innodb_Cache_insufficient detected"
description: "{{$labels.instance}}: Mysql innodb_Cache was used more than 80% ,(current value is: {{ $value }})"
角色授权报错参考:https://www.cnblogs.com/wangxu01/articles/11655443.html
mysql 监控参考:https://blog.youkuaiyun.com/qq_32502263/article/details/118794813