目录
前言
环境:centos7.9、jdk1.8.0_161、kafka_2.13-3.3.1
官网:https://kafka.apache.org/
服务器准备
准备3台虚拟机,我们用来安装3个节点的Kafka集群,同时3个节点上也要安装zookeeper,因为Kafka需要使用zookeeper。zookeeper可以使用Kafka只带的也可以另外安装,这里就使用Kafka自带的zk。
虚拟机1 192.168.118.131
虚拟机2 192.168.118.132
虚拟机3 192.168.118.133
安装jdk环境
Kafka需要jdk1.8以上环境,3台虚拟机都安装jdk环境:
JDK的官网:https://www.oracle.com/java/technologies/downloads/ 自行下载
tar -zxvf jdk-8u161-linux-x64.tar.gz -C /usr/local/
mv /usr/local/jdk1.8.0_161 /usr/local/java
cat >> /etc/profile <<'EOF'
export JAVA_HOME=/usr/local/java
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
EOF
source /etc/profile
java -version
下载并安装Kafka
下面,我们使用3台虚拟机搭建Kafka集群,每台虚拟机上都安装Kafka软件,并启动Kafka,默认的Kafka包里面带了zookeeper,我们使用默认的zookeeper软件,如果你已经安装了zookeeper,可以直接使用现成的zookeeper而不是使用默认的zookeeper。
Kafka的安装包kafka_2.13-3.3.1.tgz
,前面部分2.13
其实是scala
的版本(kafka
是scala
编写的),后面三位3.3.1
就是真正的 kafka
版本,我们这次采用最新的稳定版本进行安装:
#下载Kafka安装包
wget https://downloads.apache.org/kafka/3.3.1/kafka_2.13-3.3.1.tgz
tar -zxvf kafka_2.13-3.3.1.tgz
cd kafka_2.13-3.3.1/
[root@master kafka_2.13-3.3.1]# ll
total 64
drwxr-xr-x 3 root root 4096 Sep 30 03:06 bin #启动脚本路径,每个组件都有自己的启动脚本
drwxr-xr-x 3 root root 4096 Sep 30 03:06 config #配置文件,每个组件都有自己的配置文件
drwxr-xr-x 2 root root 8192 Oct 14 12:19 libs
-rw-rw-r-- 1 root root 14842 Sep 30 03:03 LICENSE
drwxr-xr-x 2 root root 284 Sep 30 03:06 licenses
-rw-rw-r-- 1 root root 28184 Sep 30 03:03 NOTICE
drwxr-xr-x 2 root root 44 Sep 30 03:06 site-docs
#Kafka自带了zookeeper,所以需要先修改zookeeper的配置文件
mkdir /opt/zookeeper #创建一个目录用于存放zookeeper的数据
cd config/ #进入kafka配置文件目录
vim zookeeper.properties #编辑zookeeper的配置文件
dataDir=/opt/zookeeper #数据文件目录改成/opt/zookeeper,默认的/tmp/zookeeper有被清空数据风险
clientPort=2181 #这是zookeeper的端口,保持默认即可
maxClientCnxns=0 #保持默认
admin.enableServer=false #保持默认
# admin.serverPort=8080 #管理端端口
#修改kafka的broker配置文件,即server.properties
cd /opt/kafka_2.13-3.3.1/ && mkdir kafka-logs #创建一个kafka-logs目录,Kafka的日志数据将存放在这个目录
vim server.properties #编辑Kafka的broker主配置文件,主要修改下面的这几个参数
broker.id=0 #broker的id,在集群环境中这个broker.id必须唯一,这里我们保持0即可,其他两个节点改成对应的1和2
# The address the socket server listens on. If not configured, the host name will be equal to the value of
# java.net.InetAddress.getCanonicalHostName(), with PLAINTEXT listener name, and port 9092.
# FORMAT:
# listeners = listener_name://host_name:port
# EXAMPLE:
# listeners = PLAINTEXT://your.host.name:9092
listeners=PLAINTEXT://192.168.118.131:9092 #取消注释并且改成服务器的IP
# Listener name, hostname and port the broker will advertise to clients.
# If not set, it uses the value for "listeners".
advertised.listeners=PLAINTEXT://192.168.118.131:9092 #取消注释并且改成服务器的IP
zookeeper.connect=192.168.118.131:2181,192.168.118.132:2181,192.168.118.133:2181 #配置为zk的集群地址
注意:3台虚拟机都修改对应的参数,修改kafka的server.properties
配置文件时候,broker.id
千万不能相同,ip改成自己对于的ip。
配置Kafka环境变量
#在文件末尾追加下面这3行
cat >> /etc/profile <<'EOF'
#KAFKA_HOME
export KAFKA_HOME=/opt/kafka_2.13-3.3.1
export PATH=$PATH:$KAFKA_HOME/bin
EOF
source /etc/profile
启动zookeeper、Kafka
配置已经修改完成,可以启动zookeeper、Kafka:
#3台服务器都启动zookeeper
cd /opt/kafka_2.13-3.3.1/bin/
#-daemon表示后台启动,../config/zookeeper.properties指定配置文件
./zookeeper-server-start.sh -daemon ../config/zookeeper.properties
#zookeeper启动成功,端口已经起来了
lsof -i:2181
#后台启动kafka集群,指定配置文件,3台的kafka都要启动
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-server-start.sh -daemon ../config/server.properties
#kafka启动成功
lsof -i:9092
停止kafka、zookeeper
必须先停止kafka,稍等一下,再停止zookeeper,因为Kafka停止的时候需要将信息写入zookeeper,如果先停止zookeeper的话,kafka可能就一直停止不了。
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-server-stop.sh #先分别停止3个节点的kafka
./zookeeper-server-stop.sh #稍等一下,再分别停止3个节点的zookeeper
创建topic
#查看帮助
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-topics.sh --help
Create, delete, describe, or change a topic.
Option Description
------ -----------
--alter 修改分区、副本个数或者修改topic的配置
--at-min-isr-partitions if set when describing topics, only
show partitions whose isr count is
equal to the configured minimum.
--bootstrap-server ip:port,ip:port... 连接到kafka broker,可以写多个broker的IP端口 (必须参数)
--command-config <String: command Property file containing configs to be
config property file> passed to Admin Client. This is used
only with --bootstrap-server option
for describing and altering broker
configs.
--config <String: name=value> A topic configuration override for the
topic being created or altered. The
following is a list of valid
configurations:
cleanup.policy
compression.type
delete.retention.ms
file.delete.delay.ms
flush.messages
flush.ms
follower.replication.throttled.
replicas
index.interval.bytes
leader.replication.throttled.replicas
local.retention.bytes
local.retention.ms
max.compaction.lag.ms
max.message.bytes
message.downconversion.enable
message.format.version
message.timestamp.difference.max.ms
message.timestamp.type
min.cleanable.dirty.ratio
min.compaction.lag.ms
min.insync.replicas
preallocate
remote.storage.enable
retention.bytes
retention.ms
segment.bytes
segment.index.bytes
segment.jitter.ms
segment.ms
unclean.leader.election.enable
See the Kafka documentation for full
details on the topic configs. It is
supported only in combination with --
create if --bootstrap-server option
is used (the kafka-configs CLI
supports altering topic configs with
a --bootstrap-server option).
--create 创建一个新的topic.
--delete 删除一个topic
--delete-config <String: name> A topic configuration override to be
removed for an existing topic (see
the list of configurations under the
--config option). Not supported with
the --bootstrap-server option.
--describe 列出指定的topics的详细描述
--disable-rack-aware Disable rack aware replica assignment
--exclude-internal exclude internal topics when running
list or describe command. The
internal topics will be listed by
default
--help 打印帮助信息
--if-exists if set when altering or deleting or
describing topics, the action will
only execute if the topic exists.
--if-not-exists if set when creating topics, the
action will only execute if the
topic does not already exist.
--list 列出全部可用的topic
--partitions <Integer: # of partitions> The number of partitions for the topic
being created or altered (WARNING:
If partitions are increased for a
topic that has a key, the partition
logic or ordering of the messages
will be affected). If not supplied
for create, defaults to the cluster
default.
--replica-assignment <String: A list of manual partition-to-broker
broker_id_for_part1_replica1 : assignments for the topic being
broker_id_for_part1_replica2 , created or altered.
broker_id_for_part2_replica1 :
broker_id_for_part2_replica2 , ...>
--replication-factor <Integer: The replication factor for each
replication factor> partition in the topic being
created. If not supplied, defaults
to the cluster default.
--topic <String: topic> 要操作的topic
--topic-id <String: topic-id> The topic-id to describe.This is used
only with --bootstrap-server option
for describing topics.
--topics-with-overrides if set when describing topics, only
show topics that have overridden
configs
--unavailable-partitions if set when describing topics, only
show partitions whose leader is not
available
--under-min-isr-partitions if set when describing topics, only
show partitions whose isr count is
less than the configured minimum.
--under-replicated-partitions if set when describing topics, only
show under replicated partitions
--version 显示kafka版本
[root@master bin]#
参数:
--bootstrap-server ip:port #连接到kafka broker主机的IP端口(必须参数)
--topic <string:topic> #要操作的topic名称
--create #创建主题,topic名字是唯一的,不能重复
--delete #删除主题
--alter #修改主题
--list #查看所有主题
--describe #查看指定的主题的详细描述
--partitions <Integer> #指定主题的分区数
--replication-factor <Integer> #指定分区的副本数
--config <String: name=value> #更改主题默认的配置
#查看现在有多少个主题,0个
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-topics.sh --bootstrap-server 192.168.118.131:9092 --list
#创建一个主题,名称叫first ,要指定主题的分区数为1,副本数为3
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-topics.sh --bootstrap-server 192.168.118.131:9092 --topic first --create --partitions 1 --replication-factor 3
#查看指定topic的详细描述,可以看到 first这个主题id,有1个分区,3个副本,分区位于broker-0服务器上,
# leader也是位于broker-0服务器上,副本分布在broker-0,broker-1,broker-2 上
# Partition: 0 Leader: 0 Replicas: 0,2,1 这个句的012表示broker的id
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-topics.sh --bootstrap-server 192.168.118.131:9092 --topic first --describe
Topic: first TopicId: RzmzEy0uQdm3HiB5NQiQzQ PartitionCount: 1 ReplicationFactor: 3 Configs:
Topic: first Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
[root@master bin]#
#增加指定topic的分区数,使用--alter(分区数只能增加不能减少)
[root@master bin]# ./kafka-topics.sh --bootstrap-server 192.168.118.131:9092 --topic first --alter --partitions 2
#查看主题,分区已经增加了
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-topics.sh --bootstrap-server 192.168.118.131:9092 --topic first --describe
Topic: first TopicId: RzmzEy0uQdm3HiB5NQiQzQ PartitionCount: 2 ReplicationFactor: 3 Configs:
Topic: first Partition: 0 Leader: 0 Replicas: 0,2,1 Isr: 0,2,1
Topic: first Partition: 1 Leader: 1 Replicas: 1,2,0 Isr: 1,2,0
[root@master bin]#
#无法通过命令行修改topic的副本数,增加减少都不行,命令行会保错
创建生产者producer
#查看使用帮助
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-console-producer.sh --help
This tool helps to read data from standard input and publish it to Kafka.
Option Description
------ -----------
--batch-size <Integer: size> Number of messages to send in a single
batch if they are not being sent
synchronously. please note that this
option will be replaced if max-
partition-memory-bytes is also set
(default: 16384)
--bootstrap-server <ip:port,ip:port...> #指定要连接的broker
--broker-list <String: broker-list> DEPRECATED, use --bootstrap-server
instead; ignored if --bootstrap-
server is specified. The broker
list string in the form HOST1:PORT1,
HOST2:PORT2.
--compression-codec [String: The compression codec: either 'none',
compression-codec] 'gzip', 'snappy', 'lz4', or 'zstd'.
If specified without value, then it
defaults to 'gzip'
--help Print usage information.
--line-reader <String: reader_class> The class name of the class to use for
reading lines from standard in. By
default each line is read as a
separate message. (default: kafka.
tools.
ConsoleProducer$LineMessageReader)
--max-block-ms <Long: max block on The max time that the producer will
send> block for during a send request.
(default: 60000)
--max-memory-bytes <Long: total memory The total memory used by the producer
in bytes> to buffer records waiting to be sent
to the server. This is the option to
control `buffer.memory` in producer
configs. (default: 33554432)
--max-partition-memory-bytes <Integer: The buffer size allocated for a
memory in bytes per partition> partition. When records are received
which are smaller than this size the
producer will attempt to
optimistically group them together
until this size is reached. This is
the option to control `batch.size`
in producer configs. (default: 16384)
--message-send-max-retries <Integer> Brokers can fail receiving the message
for multiple reasons, and being
unavailable transiently is just one
of them. This property specifies the
number of retries before the
producer give up and drop this
message. This is the option to
control `retries` in producer
configs. (default: 3)
--metadata-expiry-ms <Long: metadata The period of time in milliseconds
expiration interval> after which we force a refresh of
metadata even if we haven't seen any
leadership changes. This is the
option to control `metadata.max.age.
ms` in producer configs. (default:
300000)
--producer-property <String: A mechanism to pass user-defined
producer_prop> properties in the form key=value to
the producer.
--producer.config <String: config file> Producer config properties file. Note
that [producer-property] takes
precedence over this config.
--property <String: prop> A mechanism to pass user-defined
properties in the form key=value to
the message reader. This allows
custom configuration for a user-
defined message reader.
Default properties include:
parse.key=false
parse.headers=false
ignore.error=false
key.separator=\t
headers.delimiter=\t
headers.separator=,
headers.key.separator=:
null.marker= When set, any fields
(key, value and headers) equal to
this will be replaced by null
Default parsing pattern when:
parse.headers=true and parse.key=true:
"h1:v1,h2:v2...\tkey\tvalue"
parse.key=true:
"key\tvalue"
parse.headers=true:
"h1:v1,h2:v2...\tvalue"
--request-required-acks <String: The required `acks` of the producer
request required acks> requests (default: -1)
--request-timeout-ms <Integer: request The ack timeout of the producer
timeout ms> requests. Value must be non-negative
and non-zero. (default: 1500)
--retry-backoff-ms <Long> Before each retry, the producer
refreshes the metadata of relevant
topics. Since leader election takes
a bit of time, this property
specifies the amount of time that
the producer waits before refreshing
the metadata. This is the option to
control `retry.backoff.ms` in
producer configs. (default: 100)
--socket-buffer-size <Integer: size> The size of the tcp RECV size. This is
the option to control `send.buffer.
bytes` in producer configs.
(default: 102400)
--sync If set message send requests to the
brokers are synchronously, one at a
time as they arrive.
--timeout <Long: timeout_ms> If set and the producer is running in
asynchronous mode, this gives the
maximum amount of time a message
will queue awaiting sufficient batch
size. The value is given in ms. This
is the option to control `linger.ms`
in producer configs. (default: 1000)
--topic <String: topic> 指定topic
--version 显示Kafka版本信息
[root@master bin]#
#创建生产者,往指定topic里面传输数据
cd /opt/kafka_2.13-3.3.1/bin/
./kafka-console-producer.sh --bootstrap-server 192.168.118.131:9092 --topic first
>holl
hjdc
dfdfd
>>>gfgf
>ghghg
>fdfd
>dd
>dfd
>sdsdsdsd
>wewuehwueh
>
创建消费者consumer
#创建消费者,消费者启动之后可以看到,消费者只消费到了后面的数据,之前的数据没有消费到,这是由于消费者默认消费当前数据,历史数据不消费
# 可以使用 --from-beginning 可以让消费者从头开始消费数据
[root@master bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.118.131:9092 --topic first
dfd
sdsdsdsd
wewuehwueh
#使用 --from-beginning 可以让消费者从头开始消费数据
[root@master bin]# ./kafka-console-consumer.sh --bootstrap-server 192.168.118.131:9092 --topic first --from-beginning
holl #可以看到,历史数据也消费了
hjdc
dfdfd
gfgf
ghghg
fdfd
dd
dfd
sdsdsdsd
wewuehwueh
ddd
创建消费者组consumer group
# 创建消费者组,默认会创建
#开两个shell窗口,创建两个消费者,它们同属一个名叫testGroup的消费组,这个消费组里面消费者消费topic test1
# 前面说明,在同一个消费组里面的消费者,只能有一个消费者消费topic消息,所以下面会有一个得不到消息
kafka-console-consumer.sh --consumer.config /tmp/client.properties --bootstrap-server \
kafka-cluster.kafka.svc.cluster.local:9092 --consumer-property group.id=testGroup \
--topic test
kafka-console-consumer.sh --consumer.config /tmp/client.properties --bootstrap-server \
kafka-cluster.kafka.svc.cluster.local:9092 --consumer-property group.id=testGroup \
--topic test
查看消费组
# 列出全部消费组,只有一个testGroup消费者
kafka-consumer-groups.sh --command-config /tmp/client.properties --bootstrap-server kafka-cluster.kafka.svc.cluster.local:9092 --list
testGroup
# 查看消费者的信息
kafka-consumer-groups.sh --command-config /tmp/client.properties --bootstrap-server kafka-cluster.kafka.svc.cluster.local:9092 --describe --group testGroup
GROUP TOPIC PARTITION CURRENT-OFFSET LOG-END-OFFSET LAG CONSUMER-ID HOST CLIENT-ID
testGroup test 0 58 58 0 console-consumer-496b69d2-bb0d-4d9b-b262-b0f503b7d727 /10.244.2.162 console-consumer
GROUP:消费组名testGroup
TOPIC:topic主题
PARTITION:分区,0号分区
CURRENT-OFFSET: 当前消费组的已消费偏移量
LOG-END-OFFSET: topic对应分区消息的结束偏移量(HW)
LAG: 当前消费组未消费的消息数(从这个可以看到消息堆积数)
k8s使用helm部署kafka
在k8s集群中,使用helm直接部署kafka集群。
kubectl create ns kafka
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
mkdir /root/kafka && cd /root/kafka/
helm search repo kafka
helm pull bitnami/kafka --version=29.3.6
tar xf kafka-29.3.6.tgz
cd kafka/
# 改一下存储类,资源大小,pvd大小等
vim values.yaml
helm -n kafka template kafka-cluster ./
#安装
helm -n kafka install kafka-cluster ./
kafka的svc地址:kafka-cluster.kafka.svc.cluster.local
kafka pod的域名:
kafka-cluster-controller-0.kafka-cluster-controller-headless.kafka.svc.cluster.local:9092
kafka-cluster-controller-1.kafka-cluster-controller-headless.kafka.svc.cluster.local:9092
kafka-cluster-controller-2.kafka-cluster-controller-headless.kafka.svc.cluster.local:9092
#要使用kafka客户端链接kafka服务端,需要创建一个配置文件
cat > /root/kafka/client.properties <<EOF
security.protocol=SASL_PLAINTEXT
sasl.mechanism=SCRAM-SHA-256
sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required \
username="user1" \
password="$(kubectl get secret kafka-cluster-user-passwords --namespace kafka -o jsonpath='{.data.client-passwords}' | base64 -d | cut -d , -f 1)";
EOF
# 创建kafka客户端pod
kubectl run kafka-cluster-client --restart='Never' --image m.daocloud.io/docker.io/bitnami/kafka:3.7.1-debian-12-r0 --namespace kafka --command -- sleep infinity
kubectl cp --namespace kafka /root/kafka/client.properties kafka-cluster-client:/tmp/client.properties
kubectl exec --tty -i kafka-cluster-client --namespace kafka -- bash
生产者:
kafka-console-producer.sh \
--producer.config /tmp/client.properties \
--broker-list kafka-cluster-controller-0.kafka-cluster-controller-headless.kafka.svc.cluster.local:9092,kafka-cluster-controller-1.kafka-cluster-controller-headless.kafka.svc.cluster.local:9092,kafka-cluster-controller-2.kafka-cluster-controller-headless.kafka.svc.cluster.local:9092 \
--topic test
消费者:
kafka-console-consumer.sh \
--consumer.config /tmp/client.properties \
--bootstrap-server kafka-cluster.kafka.svc.cluster.local:9092 \
--topic test \
--from-beginning