flink 多表join的例子

博主记录了一个稍复杂的例子,实现了类似mysql group_concat的功能,还提及MapToString参考了之前关于bug的博客,并给出了转载链接。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

今天写了一个稍微复杂的例子, 实现了类似mysql group_concat 功能,记录一下
MapToString 参考bug 那篇博客

public static void main(String[] arg) throws Exception {

        final ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment();
        BatchTableEnvironment tableEnv = new BatchTableEnvironment(env, TableConfig.DEFAULT());
        tableEnv.registerFunction("mapToString", new MapToString());

        getProjectInfo(env,tableEnv);
        getProject(env,tableEnv);
        joinTableProjectWithInfo(tableEnv);

        Table query = tableEnv.sqlQuery("select id, name, type from result_agg");
        DataSet<Row> ds=  tableEnv.toDataSet(query, Row.class);
        ds.print();
        ds.writeAsText("/home/test", WriteMode.OVERWRITE);
        env.execute("multiple-table");          
    }

    public static void getProjectInfo(ExecutionEnvironment env,BatchTableEnvironment tableEnv) {

        TypeInformation[] fieldTypes = new TypeInformation[] { BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO };
        String[] fieldNames = new String[] { "id",  "type" };
        RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes, fieldNames);
        JDBCInputFormat jdbcInputFormat = JDBCInputFormat.buildJDBCInputFormat().setDrivername("com.mysql.jdbc.Driver")
                .setDBUrl("jdbc:mysql://ip:3306/space?characterEncoding=utf8")
                .setUsername("user").setPassword("pwd")
                .setQuery("select project_fid, cast(project_info_type as CHAR) as type from project").setRowTypeInfo(rowTypeInfo).finish();
        DataSource<Row> s = env.createInput(jdbcInputFormat);   
        tableEnv.registerDataSet("project_info", s);
        aggProjectInfo(tableEnv,"project_info");
    }

    public static void aggProjectInfo(BatchTableEnvironment tableEnv, String tableName) {   
        Table tapiResult = tableEnv.scan(tableName);
        tapiResult.printSchema();               
        Table query = tableEnv.sqlQuery("select id, mapToString(collect(type)) as type from project_info group by id");
        tableEnv.registerTable(tableName+"_agg", query);        
        tapiResult = tableEnv.scan(tableName+"_agg");
        tapiResult.printSchema();   
    }

    public static void getProject(ExecutionEnvironment env,BatchTableEnvironment tableEnv) {

        TypeInformation[] fieldTypes = new TypeInformation[] { BasicTypeInfo.STRING_TYPE_INFO, BasicTypeInfo.STRING_TYPE_INFO };
        String[] fieldNames = new String[] { "pid",  "name" };
        RowTypeInfo rowTypeInfo = new RowTypeInfo(fieldTypes, fieldNames);
        JDBCInputFormat jdbcInputFormat = JDBCInputFormat.buildJDBCInputFormat().setDrivername("com.mysql.jdbc.Driver")
                .setDBUrl("jdbc:mysql://ip:3306/space?characterEncoding=utf8")
                .setUsername("user").setPassword("pwd")
                .setQuery("select fid, project_name  from t_project").setRowTypeInfo(rowTypeInfo).finish();
        DataSource<Row> s = env.createInput(jdbcInputFormat);
        tableEnv.registerDataSet("project", s);

    }

    public static void joinTableProjectWithInfo(BatchTableEnvironment tableEnv) {
        Table result =tableEnv.sqlQuery("select a.pid as id , a.name , b.type  from project a inner join  project_info_agg  b on a.pid=b.id");
        tableEnv.registerTable("result_agg", result);
        result.printSchema();
    }

转载于:https://blog.51cto.com/12597095/2398626

### 实现 Flink CDC 中 Join 并插入查询结果 在 Apache Flink 的 Change Data Capture (CDC) 场景下,可以通过流处理的方式实现之间的 Join 操作,并将最终的结果写入目标存储系统。以下是具体方法: #### 数据准备阶段 为了支持 Join 和后续的数据写入操作,需要先定义输入和输出的结构。假设我们有两个 MySQL `orders` 和 `customers` 需要进行关联。 ```sql -- 定义 orders CREATE TABLE orders ( order_id BIGINT, customer_id BIGINT, product STRING, amount DOUBLE, PRIMARY KEY(order_id) NOT ENFORCED ) WITH ( 'connector' = 'mysql-cdc', 'hostname' = 'localhost', 'port' = '3306', 'username' = 'root', 'password' = 'password', 'database-name' = 'sales_db', 'table-name' = 'orders' ); -- 定义 customers CREATE TABLE customers ( id BIGINT, name STRING, email STRING, address STRING, PRIMARY KEY(id) NOT ENFORCED ) WITH ( 'connector' = 'mysql-cdc', 'hostname' = 'localhost', 'port' = '3306', 'username' = 'root', 'password' = 'password', 'database-name' = 'sales_db', 'table-name' = 'customers' ); ``` 上述 SQL 创建了两个基于 MySQL-CDC 连接器的动态[^1]。 --- #### Join 查询逻辑 通过 Flink SQL 提供的 Join 功能可以轻松完成两张的连接操作。以下是一个简单的例子,展示如何将订单 (`orders`) 和客户 (`customers`) 关联起来,并计算每个客户的总消费金额。 ```sql -- 计算每个客户的总消费金额 SELECT c.id AS customer_id, c.name AS customer_name, SUM(o.amount) AS total_spent FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY c.id, c.name; ``` 此查询会持续监控来自 `orders` 和 `customers` 的变化,并更新每名用户的累计消费总额[^2]。 --- #### 将结果写入外部存储 最后一步是将 Join 后的结果保存到外部存储中。这里以 Hudi 为例说明如何配置 Sink 并将数据持久化。 ```sql -- 创建 Hudi 输出 CREATE TABLE customer_spend_summary ( customer_id BIGINT, customer_name STRING, total_spent DOUBLE ) WITH ( 'connector' = 'hudi', 'path' = 's3://your-bucket/path/to/hudi/table/', 'table.type' = 'MERGE_ON_READ', 'write.precombine.field' = 'total_spent', 'hoodie.datasource.write.recordkey.field' = 'customer_id', 'hive_sync.enable' = 'true', 'hive_sync.database' = 'summary_db', 'hive_sync.table' = 'customer_spend_summary' ); -- 插入 Join 结果到 Hudi INSERT INTO customer_spend_summary SELECT c.id AS customer_id, c.name AS customer_name, SUM(o.amount) AS total_spent FROM orders o JOIN customers c ON o.customer_id = c.id GROUP BY c.id, c.name; ``` 以上代码片段展示了如何创建一个 Hudi 格式的 Sink ,并将 Join 的结果实时同步至该中。 --- #### 注意事项 - **延迟容忍度**:由于分布式系统的特性,在某些情况下可能会遇到乱序事件到达的情况。因此建议设置合理的水印策略来控制延迟。 - **状态管理**:当涉及复杂窗口或聚合运算时,请确保合理调整 RocksDB 或其他状态后端的相关参数以优化性能。 - **幂等性保障**:对于 Upsert 类型的操作,需确认主键字段唯一性以及预合并字段的选择是否恰当。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值