pyflink实时消费带kerberos的kafka数据落地mysql

1、环境准备

笔者这边使用的cdh6.2.0自带的kafka,flink版本是1.16.3,python版本是3.6.5

# 安装python依赖
python3 -m pip install apache-flink==1.16.3
# 配置用户环境变量
# ps:这里的python地址要配一下,并且python下面的bin要有一个名字为python的可执行二进制包
vi ~/.bash_profile
export HADOOP_CLASSPATH=`hadoop classpath`
export HADOOP_CONF_DIR=/etc/hadoop/conf
export JAVA_HOME=/path/jdk
# 配置flink路径
export FLINK_HOME=/path/flink1.16.3
export HADOOP_COMMON_HOME=/opt/cloudera/parcels/CDH/lib/hadoop
export PATH=/path/python3.6.5/bin/:$FLINK_HOME/bin:$JAVA_HOME/bin:$PATH

2、编写代码

准备krb5.conf和jaas.conf文件(本地测试需要用到)
jaas.conf的内容如下:

KafkaClient {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="/path/user.keytab"
   principal="user/域名@HADOOP.COM";
};
Client {
   com.sun.security.auth.module.Krb5LoginModule required
   useKeyTab=true
   keyTab="/path/user.keytab"
   principal="user/域名@HADOOP.COM";
};

pyflink连接kafka,数据落地到mysql的demo(逻辑比较简单):

 # -*- coding: UTF-8 -*-
import logging
import sys,json

from pyflink.common import Types, WatermarkStrategy, Row
from pyflink.datastream import StreamExecutionEnvironment
from pyflink.datastream.connectors.kafka import FlinkKafkaProducer, FlinkKafkaConsumer, KafkaSource, KafkaOffsetsInitializer, KafkaSink
from pyflink.datastream.formats.json import JsonRowSerializationSchema, JsonRowDeserializationSchema
from pyflink.java_gateway import get_gateway
from pyflink.common.serialization import SimpleStringSchema
from pyflink.datastream.connectors.jdbc import JdbcSink, JdbcConnectionOptions, JdbcExecutionOptions
from pyflink.datastream.data_stream import DataStream
from pyflink.table.descriptors import Schema


# 提交到yarn,这段要注释掉
# 本地测试需要放开
# System = get_gateway().jvm.System
# System.setProperty("java.security.krb5.conf", "/path/krb5.conf");
# System.setProperty("java.security.auth.login.config", "/path/jaas.conf");
# print中文编码问题
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

if __name__ == '__main__':
    logging.basicConfig(stream=sys.stdout, level=logging.INFO, format="%(message)s")

    env = StreamExecutionEnvironment.get_execution_environment()
    # 本地测试需要放开,提交到集群可以注释掉
    # env.add_jars("file:///path/flink-sql-connector-kafka-1.16.3.jar")
    # env.add_jars("file:///path/mysql-connector-java-8.0.22.jar")
    # env.add_jars("file:///path/flink-connector-jdbc-1.16.3.jar")


    print("add kafka source")
    # kafka 最好配置域名访问,不然可能会报出kerberos数据库不存在user/ip@HADOOP.COM凭据的错误
    kafka_source = KafkaSource.builder() \
        .set_bootstrap_servers("xxxx:9092,xxxx:9092,xxxx:9092") \
        .set_topics("test1") \
        .set_group_id("test_group_1") \
        .set_value_only_deserializer(SimpleStringSchema()) \
        .set_property("sasl.kerberos.service.name", "kafka") \
        .set_property("sasl.mechanism", "GSSAPI") \
        .set_property("security.protocol", "SASL_PLAINTEXT"
在Java中,为了读取有Kerberos认证的Apache Kafka主题,你需要首先配置客户端连接到Kafka服务器,并通过Jaas配置文件启用Kerberos身份验证。以下是一个简单的示例,展示了如何使用`kafka-clients`库: ```java import org.apache.kafka.clients.consumer.ConsumerConfig; import org.apache.kafka.clients.consumer.ConsumerRecord; import org.apache.kafka.common.serialization.StringDeserializer; import java.util.Arrays; import java.util.Properties; public class KerberizedKafkaConsumer { private static final String KAFKA_BOOTSTRAP_SERVERS = "localhost:9092"; private static final String TOPIC_NAME = "your-topic-name"; private static final String PRINCIPAL = "your-service_principal"; private static final String KEYTAB_FILE = "/path/to/your/keytab"; public static void main(String[] args) { Properties props = new Properties(); // 添加Kafka连接属性 props.put(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, KAFKA_BOOTSTRAP_SERVERS); props.put(ConsumerConfig.GROUP_ID_CONFIG, "your-consumer-group"); props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, StringDeserializer.class.getName()); // Kerberos配置 props.put("security.protocol", "SASL_PLAINTEXT"); props.put("sasl.mechanism", "GSSAPI"); // 使用Kerberos机制 props.put("sasl.jaas.config", getJaasConfig()); // 创建消费者实例 KafkaConsumer<String, String> consumer = new KafkaConsumer<>(props); try { // 指定订阅的主题 consumer.subscribe(Arrays.asList(TOPIC_NAME)); while (true) { ConsumerRecords<String, String> records = consumer.poll(Duration.ofMillis(100)); for (ConsumerRecord<String, String> record : records) { System.out.printf("offset = %d, key = %s, value = %s%n", record.offset(), record.key(), record.value()); } } } finally { consumer.close(); } } private static String getJaasConfig() { return """ KafkaClient { com.sun.security.auth.module.Krb5LoginModule required useKeyTab=true storeKey=true principal="""" + PRINCIPAL + """" keyTab=""" + KEYTAB_FILE + """; }; """.strip(); } } ``` 这个例子假设你已经安装了KafkaKerberos环境并且有一个有效的keytab文件。记得替换`KAFKA_BOOTSTRAP_SERVERS`, `TOPIC_NAME`, `PRINCIPAL`, 和 `KEYTAB_FILE`为你实际的值。
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值