Kubernetes API Server对象修改的乐观锁控制
一 背景
项目需要在OpenShift管理的Prometheus的基础上包装通过Web界面告警规则管理的功能,OpenShift 3.11版所内置的Prometheus支持通过一种叫做PrometheusRule的Custom Resource(CR)的对象管理来管理告警规则。业务上存在多人管理同个CR的可能性,所以需要防止对同个CR进行并发竞争修改,造成结果错误。
Kubernetes API Server是支持乐观锁(Optimistic concurrency control)的机制来防止并发写造成的覆盖写问题,详见此文章。通过给资源对象赋予版本号,并且API Server在更新时检查用户上传的对象中的metadata.resourceVersion来核对是否此次修改已经过时来保证修改的正确性。
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: arule
namespace: openshift-monitoring
resourceVersion: "2218687"
...
spec:
.....
客户端获取到一个带resourceVersion字段的对象后进行修改,然后上传修改时必须同时将resourceVersion字段送回,这样API Server就会自行防止并发更新错误。
二 oc edit的行为
打开两个terminal A和B,都同时运行oc edit prometheusrule arule,在Terminal A进行一个改动,比如将下面的spec里面的user改为otherone,然后保存修改。
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
resourceVersion: "2322087"
spec:
groups:
- name: general.rules
rules:
- alert: TargetDown-serviceprom
........
for: 33m
labels:
severity: warning
user: someone ----> 改为otherone。
在Terminal A执行oc get prometheusrule arule的话,可以看到修改生效。
然后在Terminal B在同样的对象上对for做修改后保存,最后再把对象读回来可以发现在Terminal A对user的修改丢失了。
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
resourceVersion: "2322087"
spec:
groups:
- name: general.rules
rules:
- alert: TargetDown-serviceprom
........
for: 33m
labels:
severity: warning
user: someone ----> 改为otherone。
三 HTTP PUT API的乐观锁
在OpenShift集群中的有访问Custom Resource对象权限的容器里,先用curl将对象下载下来保存成文件rule1.json和rule2.json, 分别对rule1做上一节Terminal A的修改,对rule2.json做Terminal B的修改,
> curl -k -X GET -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://$KUBERNETES_PORT_443_TCP_ADDR:$KUBERNETES_SERVICE_PORT_HTTPS/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule > /tmp/rule1.json
> curl -k -X GET -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://$KUBERNETES_PORT_443_TCP_ADDR:$KUBERNETES_SERVICE_PORT_HTTPS/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule > /tmp/rule2.json
然后执行HTTP PUT对Custom Resource对象进行修改,
1> curl -k -XPUT -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://$KUBERNETES_PORT_443_TCP_ADDR:$KUBERNETES_SERVICE_PORT_HTTPS/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule -T /tmp/rule1.json
#上面的命令成功执行
2> curl -k -XPUT -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://$KUBERNETES_PORT_443_TCP_ADDR:$KUBERNETES_SERVICE_PORT_HTTPS/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule -T /tmp/rule2.json
#报错
第2条指令会报出以下错误:
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "Operation cannot be fulfilled on prometheusrules.monitoring.coreos.com \"arule\": the object has been modified; please apply your changes to the latest version and try again",
"reason": "Conflict",
"details": {
"name": "arule",
"group": "monitoring.coreos.com",
"kind": "prometheusrules"
},
"code": 409
}
可以看出API Server对HTTP PUT进行了乐观锁控制。
四 讨论
为什么API Server对oc edit并没有进行乐观锁控制,产生了并发修改覆盖。打开oc客户端的最高9级日志oc --v=9 edit,可以看到oc edit使用的是HTTP PATCH指令,
I1108 17:52:19.592131 94141 request.go:897] Request Body: {"spec":{"groups":[{"name":"general.rules","rules":[{"alert":"TargetDown-serviceprom","annotations":{"description":"{{ $value }}% of {{ $labels.job }} targets are down.","summary":"Targets are down"},"expr":"100 * (count(up == 0) BY (job) / count(up) BY (job)) \u003e 10","for":"53m","labels":{"severity":"warning","user":"改为otherone"}}]}]}}
I1108 17:52:19.609156 94141 round_trippers.go:405] PATCH https://oc.exp.myhost.local:443/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule 200 OK in 16 milliseconds
I1108 17:52:19.609232 94141 round_trippers.go:411] Response Headers:
I1108 17:52:19.609241 94141 round_trippers.go:414] Content-Length: 1463
I1108 17:52:19.609248 94141 round_trippers.go:414] Date: Fri, 08 Nov 2019 09:52:19 GMT
I1108 17:52:19.609255 94141 round_trippers.go:414] Cache-Control: no-store
I1108 17:52:19.609261 94141 round_trippers.go:414] Content-Type: application/json
从第一行可以看出,oc edit在修改完成后往API Server发送请求时并没有带上metadata.resourceVersion所以没有办法进行并发版本控制。
尝试在HTTP PUT时不带上resourceVersion(在PUT文件中删除该字段), API Server返回422 Unprocessable Entity
$ curl -k -XPUT -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://$KUBERNETES_PORT_443_TCP_ADDR:$KUBERNETES_SERVICE_PORT_HTTPS/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule -T /tmp/rule2.json
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "prometheusrules.monitoring.coreos.com \"arule\" is invalid: metadata.resourceVersion: Invalid value: 0x0: must be specified for an update",
"reason": "Invalid",
"details": {
"name": "arule",
"group": "monitoring.coreos.com",
"kind": "prometheusrules",
"causes": [
{
"reason": "FieldValueInvalid",
"message": "Invalid value: 0x0: must be specified for an update",
"field": "metadata.resourceVersion"
}
]
},
"code": 422
}
在HTTP Patch请求中带上resourceVersion,API Server同样会进行版本检查。
$ curl -k -XPATCH -H"Content-type:application/merge-patch+json" -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" https://$KUBERNETES_PORT_443_TCP_ADDR:$KUBERNETES_SERVICE_PORT_HTTPS/apis/monitoring.coreos.com/v1/namespaces/openshift-monitoring/prometheusrules/arule -T /tmp/rule2.json
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "Operation cannot be fulfilled on prometheusrules.monitoring.coreos.com \"arule\": the object has been modified; please apply your changes to the latest version and try again",
"reason": "Conflict",
"details": {
"name": "arule",
"group": "monitoring.coreos.com",
"kind": "prometheusrules"
},
"code": 409
}
而使用oc replace可以从日志观察到其使用HTTP PUT。
I1109 15:09:24.344543 7406 round_trippers.go:386] curl -k -v -XPUT -H "Content-Type: application/json" -H "User-Agent: oc/v1.11.0+d4cacc0 (darwin/amd64) kubernetes/d4cacc0" -H "Accept: application/json" -H "Authorization: Bearer 7PooYteCfBygkyQIYUqN0bFJvUOlxBHtQ_BsobbS
五 结论
- Kubernetes API Server要求HTTP PUT必须带上metadata.resourceVersion, 并且根据resourceVersion对PUT进行乐观锁控制,防止并发修改覆盖问题;
- Kubernetes API Server对带上metadata.resourceVersion的HTTP PATCH请求会进行与PUT相同的版本控制逻辑,而不带resourceVersion的PATCH请求不进行乐观锁控制,可能会导致并发修改覆盖问题。
- oc edit命令使用的是HTTP PATCH方法,而oc replace命令使用的是HTTP PUT。
- 如果需要更新乐观锁,无论使用PATCH或是PUT,请求里面必须带上正确的metadata.resourceVersion。
本文探讨了Kubernetes API Server如何通过乐观锁防止并发写覆盖问题。详细介绍了`oc edit`行为、HTTP PUT API的乐观锁控制,并通过实例展示了在HTTP PATCH请求中乐观锁的工作方式。结论指出,HTTP PUT和带有resourceVersion的PATCH请求会进行乐观锁控制,而未带resourceVersion的PATCH请求可能引发并发修改覆盖。
2614

被折叠的 条评论
为什么被折叠?



