实时数仓、基于Flink1.11的SQL构建实时数仓 之kafka集群搭建

本文档详细介绍了如何在集群环境中搭建基于Flink1.11的实时数仓,重点是Kafka集群的配置和启动。首先,明确了环境需求,包括JDK1.8和Kafka2.4.1。接着,详细阐述了从下载安装包、上传到节点、配置server.properties和zookeeper.properties,再到分发到其他节点的步骤。最后,提供了启动Kafka集群的命令,并给出批量执行脚本,帮助快速启动集群。请注意修改配置文件中的broker.id和zookeeper.connect以适应集群设置。

实时数仓、基于Flink1.11的SQL构建实时数仓 之kafka集群搭建

目录

一:环境准备

二:安装配置

1.下载安装包:

2.上传安装包到 m1节点  并解压到:/opt/

3.配置 server.properties 

4. 配置 zookeeper.properties  

5.分发到 m2 s1

三.启动集群:


一:环境准备

jdk1.8

kafka_2.11-2.4.1

集群规划详见:实时数仓、基于Flink1.11的SQL构建实时数仓  环境说明

zookeeper集群 详见:实时数仓、基于Flink1.11的SQL构建实时数仓 之 zookeeper集群搭建

集群节点:

192.168.137.121 m1
192.168.137.122 m2
192.168.137.123 s1

二:安装配置

1.下载安装包:

https://www.apache.org/dyn/closer.cgi?path=/kafka/2.4.1/kafka_2.11-2.4.1.tgz

2.上传安装包到 m1节点  并解压到:/opt/

tar -xzf kafka_2.11-2.4.1.tgz


3.配置 server.properties 

进入目录:/opt/kafka_2.11-2.4.1/config

[root@m1 config]# cat server.properties 
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.

#注意修改每台机器不要重复
broker.id=1

############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from 
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#   &nb

Flink 1.11 中使用 Kafka Connector 时,可以通过设置 `timestamp.extractor` 参数来指定消息时间戳的提取方式。如果你想要获取 Kafka 消息的日志时间,可以使用 `LogAndSkipOnInvalidTimestamp` 提取方式,并将 `timestamp.extractor.watermark.delay-ms` 参数设置为 0。 具体来说,你需要在创建 Kafka 数据源时设置 `timestamp.extractor` 和 `timestamp.extractor.watermark.delay-ms` 参数,示例如下: ```java import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment; import org.apache.flink.streaming.connectors.kafka.FlinkKafkaConsumer; import org.apache.flink.streaming.util.serialization.SimpleStringSchema; import org.apache.flink.api.common.serialization.DeserializationSchema; import org.apache.flink.api.common.typeinfo.Types; import org.apache.flink.streaming.api.functions.AssignerWithPunctuatedWatermarks; import org.apache.flink.streaming.api.watermark.Watermark; import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema; import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchemaWrapper; import org.apache.flink.streaming.connectors.kafka.KafkaDeserializationSchema.DeserializationSchemaWrapper; import java.util.Properties; import java.util.regex.Pattern; public class KafkaSourceExample { public static void main(String[] args) throws Exception { final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setParallelism(1); Properties properties = new Properties(); properties.setProperty("bootstrap.servers", "localhost:9092"); properties.setProperty("group.id", "test"); FlinkKafkaConsumer<String> consumer = new FlinkKafkaConsumer<>( Pattern.compile("test-topic.*"), new LogAndSkipOnInvalidTimestamp<>(), // 设置 timestamp.extractor properties); consumer.setStartFromEarliest(); consumer.assignTimestampsAndWatermarks(new AssignerWithPunctuatedWatermarks<String>() { @Override public long extractTimestamp(String element, long previousTimestamp) { // 不需要实现,因为我们已经在 Kafka Consumer 中设置了 timestamp.extractor return 0; } @Override public Watermark checkAndGetNextWatermark(String lastElement, long extractedTimestamp) { // 不需要实现,因为我们已经在 Kafka Consumer 中设置了 timestamp.extractor.watermark.delay-ms return null; } }); env .addSource(consumer) .print(); env.execute("Kafka Source Example"); } public static class LogAndSkipOnInvalidTimestamp<T> extends DeserializationSchemaWrapper<T> { public LogAndSkipOnInvalidTimestamp() { super(new SimpleStringSchema()); } @Override public T deserialize(byte[] messageKey, byte[] message, String topic, int partition, long offset) throws Exception { try { // 提取消息时间戳 Long timestamp = Long.valueOf(topic.split("-")[1]); // 构造一个带时间戳的元组 return (T) Tuple2.of(new String(messageKey), new String(message), timestamp); } catch (Exception e) { // 如果提取时间戳失败,则打印一条日志并跳过该条消息 System.err.println("Skip invalid message: " + new String(message)); return null; } } } } ``` 上述示例代码中,我们通过自定义 `LogAndSkipOnInvalidTimestamp` 类来实现了 `KafkaDeserializationSchema` 接口,并在其中提取了 Kafka 消息的日志时间戳。在 `deserialize` 方法中,我们将 Kafka 消息转换为一个带时间戳的元组,并在返回时进行了类型转换。 在 `main` 函数中,我们通过 `new LogAndSkipOnInvalidTimestamp<>()` 来设置了 `timestamp.extractor` 参数,并将 `timestamp.extractor.watermark.delay-ms` 参数设置为 0。这样就可以在 Flink SQL 中使用带时间戳的元组来进行数据处理了。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值