记一次:K8s Zookeeper扩容到3节点时丢数据问题

1 背景

K8s扩容:从一节点扩容至三节点,运行在其上的zookeeper也需要扩容,从一节点扩容至三节点,增加高可用。

chart包:bitnami zookeeper 13.7.4

2 问题

扩容之前,zookeeper单节点上是有数据的,但是按照如下的扩容方式,将zookeeper扩容至三节点后,之前的旧数据没有了。

$ helm install zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage"
$ kubectl exec zk-zookeeper-0 -it -- zkCli.sh
/opt/bitnami/java/bin/java
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null zxid: -1
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1] create /fured-test mydata
Created /fured-test
[zk: localhost:2181(CONNECTED) 2] get /fured-test
mydata
[zk: localhost:2181(CONNECTED) 3] get /zookeeper/config

[zk: localhost:2181(CONNECTED) 4] quit

$ helm upgrade zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage" --set replicaCount=3
$ kubectl exec zk-zookeeper-0 -it -- zkCli.sh
/opt/bitnami/java/bin/java
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null zxid: -1
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1] get /fured-test   # 数据丢了
Node does not exist: /fured-test
[zk: localhost:2181(CONNECTED) 2] get /fured-test
Node does not exist: /fured-test
[zk: localhost:2181(CONNECTED) 3] get /zookeeper/config
server.1=zk-zookeeper-0.zk-zookeeper-headless.zk-new-3.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
server.2=zk-zookeeper-1.zk-zookeeper-headless.zk-new-3.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
server.3=zk-zookeeper-2.zk-zookeeper-headless.zk-new-3.svc.cluster.local:2888:3888:participant;0.0.0.0:2181
version=0
[zk: localhost:2181(CONNECTED) 4] quit

3 原因

k8s扩展时,先会启动新节点然后重启有改动的旧节点,当直接将zk从单节点扩展到三节点时会一次启动两个新节点,这时有两个节点的zk集群会直接选出leader节点,旧的单节点重启后直接成了follow节点,而数据同步只能是从leader节点同步到follow节点。所以旧节点始终无法成为leader节点导致其中的数据也没法复制到新启动的zk节点。

$ helm install -n zk-test zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage"
$ helm upgrade -n zk-test zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage" --set replicaCount=3
$ kubectl -n  zk-test exec zk-zookeeper-2 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Error contacting service. It is probably not running.
command terminated with exit code 1
$ kubectl -n  zk-test exec zk-zookeeper-1 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower
$ kubectl -n  zk-test exec zk-zookeeper-2 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader   # 第三个节点直接成leader了
$ kubectl -n  zk-test exec zk-zookeeper-0 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Error contacting service. It is probably not running.
command terminated with exit code 1
$ kubectl -n  zk-test exec zk-zookeeper-0 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower  # 旧节点重启后 直接时follower

4 解决

逐步扩容,将zk先扩成两个节点,再扩成3个节点:扩充两节点时会启动1个新节点,此时无法确立哪个是主节点,等旧节点重启后,旧节点的ZXID更大(因为旧节点有数据所以事务ID更大),所以会将旧节点选举为leader节点,从而将旧数据同步到新节点,第三个节点启动后,会依次重启,之前的第二个和第一个,第二个启动成功后,第二个就是leader(数据会同步给第三个),第一个启动成功后,第三个成了leader

$ helm install -n zk-test zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage"
$ kubectl -n  zk-test exec zk-zookeeper-0 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: standalone  # 单节点模式
$ kubectl -n zk-test exec zk-zookeeper-0 -it -- zkCli.sh
/opt/bitnami/java/bin/java
Connecting to localhost:2181
creWelcome to ZooKeeper!
JLine support is enabled
ate[zk: localhost:2181(CONNECTING) 0]
WATCHER::

WatchedEvent state:SyncConnected type:None path:null zxid: -1
create
create [-s] [-e] [-c] [-t ttl] path [data] [acl]
[zk: localhost:2181(CONNECTED) 1] create /fured-test mydata
Created /fured-test
[zk: localhost:2181(CONNECTED) 2] get /fured-test
mydata
$ helm upgrade -n zk-test zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage" --set replicaCount=2
$ kubectl -n  zk-test exec zk-zookeeper-1 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower  # 新启动的第二个节点为follower
$ kubectl -n  zk-test exec zk-zookeeper-0 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader  # 旧节点成了leader
$ helm upgrade -n zk-test zk zookeeper-13.7.4.tgz --set persistence.storageClass="local-storage" --set replicaCount=3
$ kubectl -n  zk-test exec zk-zookeeper-1 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: leader  # 第二个节点成了leader
$ kubectl -n  zk-test exec zk-zookeeper-2 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Mode: follower  # 第三个节点是follower
$ kubectl -n  zk-test exec zk-zookeeper-0 -it -- zkServer.sh status
/opt/bitnami/java/bin/java
ZooKeeper JMX enabled by default
Using config: /opt/bitnami/zookeeper/bin/../conf/zoo.cfg
Client port found: 2181. Client address: localhost. Client SSL: false.
Error contacting service. It is probably not running.
command terminated with exit code 1  # 第二个节点成为leader时 第一个节点还在重启
$ kubectl -n zk-test get pod
NAME             READY   STATUS    RESTARTS   AGE
zk-zookeeper-0   1/1     Running   0          77s
zk-zookeeper-1   1/1     Running   0          101s
zk-zookeeper-2   1/1     Running   0          2m12s
# 可以从三个pod运行时间看出启动顺序

Kubernetes中,如果您希望将两个Zookeeper容器调度到同一个节点上,您可以使用Pod亲和性(Pod Affinity)或节点亲和性(Node Affinity)。以下是具体步骤: ### 1. 使用Pod亲和性(Pod Affinity) Pod亲和性允许您指定一个Pod必须运行在一个特定的节点上,或者运行在一个已经运行特定Pod的节点上。对于您的用例,可以设置如下: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: zookeeper-deployment spec: replicas: 2 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: "app" operator: In values: - zookeeper topologyKey: "kubernetes.io/hostname" containers: - name: zookeeper image: zookeeper:latest ``` 在这个配置中,`requiredDuringSchedulingIgnoredDuringExecution`确保了带有标签`app=zookeeper`的Pod只能被调度到已经运行同样带有`app=zookeeper`标签的Pod所在的节点上。`topologyKey: "kubernetes.io/hostname"`表示基于节点名称来匹配。 ### 2. 使用节点亲和性(Node Affinity) 节点亲和性允许您根据节点的标签来约束Pod的调度。例如,您可以手动为特定的节点添加标签,然后将Pod调度到这些节点上。 首先,为需要调度的节点添加标签: ```bash kubectl label nodes <node-name> zookeeper=true ``` 然后,在Deployment配置中使用节点选择器: ```yaml apiVersion: apps/v1 kind: Deployment metadata: name: zookeeper-deployment spec: replicas: 2 selector: matchLabels: app: zookeeper template: metadata: labels: app: zookeeper spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: "zookeeper" operator: In values: - "true" containers: - name: zookeeper image: zookeeper:latest ``` 在这个配置中,`requiredDuringSchedulingIgnoredDuringExecution`确保了Pod只会被调度到带有`zookeeper=true`标签的节点上。 ### 注意事项 1. **资源限制**:确保目标节点有足够的资源(CPU、内存等)来运行额外的容器。 2. **高可用性**:尽量避免将所有Zookeeper实例调度到一个节点上,以防止单点故障。如果可能,尽量使用多个节点以实现高可用性。 3. **标签管理**:在使用节点选择器,请小心管理节点标签,以避免误操作导致其他应用无法正确调度。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值