一、网易音乐数仓建设之路:
https://mp.weixin.qq.com/s/FIKCe6oV8NproiKYzis_6w
二、Streamsets是由Informatica前首席产品官Girish Pancha和Cloudera前开发团队负责人Arvind Prabhakar于2014年创立的公司,总部设在旧金山。streamsets产品是一个做大数据ETL的工具,支持包括结构化和半/非结构化数据源,拖拽式的可视化数据流程设计界面。而Streamsets旗下有如下三个产品: streamsets data collector(核心产品,开源):大数据ETL工具;streamsets data collector Edge(开源):将这个组件安装在物联网等设备上,占用少的内存和CPU;streamsets control hub(收费项目):可以将collector编辑好的pipeline放入control hub进行管理,可实现定时调度、管理和pipeline拓扑;
所以之后的介绍都会在streamsets data collector这个核心开源产品
https://blog.youkuaiyun.com/qq_39657909/article/details/107685907
三、实时数据湖:Flink CDC流式写入Hudi
https://mp.weixin.qq.com/s/JkCbvfJhdz9gT-Tw1pUBIA
四、Debezium-Flink-Hudi:实时流式CDC
Debezium是一个非常方便部署使用的CDC工具,可以有效地将RMSDB数据抽取到消息系统中,供不同的下游应用消费。而Flink直接对接Debezium与Hudi的功能,极大方便了数据湖场景下的实时数据ingestion。
/*
* Licensed to the Apache Software Foundation (ASF) under one
* or more contributor license agreements. See the NOTICE file
* distributed with this work for additional information
* regarding copyright ownership. The ASF licenses this file
* to you under the Apache License, Version 2.0 (the
* "License"); you may not use this file except in compliance
* with the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package org.apache.flink.streaming.connectors.kafka.table;
import org.apache.flink.api.common.restartstrategy.RestartStrategies;
import org.apache.flink.api.common.serialization.SerializationSchema;
import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.streaming.api.datastream.DataStreamSource;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducerBase;
imp