flinksql 构造倾斜、某个task 被压的数据案例

    public static void main(String[] args) throws Exception {


        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
        StreamTableEnvironment tableEnvironment = StreamTableEnvironment.create(env);
       // tableEnvironment.

        env.disableOperatorChaining();


        String s1="CREATE TABLE datagen (\n" +
                " f_sequence INT,\n" +
                " f_random_str STRING\n" +
                ") WITH ( " +
                " 'connector' = 'datagen',\n" +

                " 'rows-per-second'='50000',\n" +

                " 'fields.f_sequence.min'='1',\n" +
                " 'fields.f_sequence.max'='100',\n" +
                " 'fields.f_random_str.length'='4'" +
                ")";
        TableResult tableResult = tableEnvironment.executeSql(s1);


        String s2="CREATE TABLE print_table (\n" +
                " f_sequence INT,\n" +
                " f_random_str STRING\n" +
                ") WITH (\n" +
                " 'connector' = 'print' " +
                ")";

      /*  String s3=" select aa  \n" +
                ", count(*) as pv\n" +
                ", count(distinct f_random_str ) as uv \n " +
                "\nfrom \n" +
                "( \n" +
                " select case when f_sequence < 50 then 1 else f_sequence end as aa  \n" +
                " , f_random_str \n" +
                " from datagen ) tmps\n" +
                " group by aa " ;*/

        String s4=" select aa, count(*) as pv,count(distinct f_random_str ) as uv from (select case when f_sequence < 50 then 1 else f_sequence end as aa , f_random_str from datagen ) as tmps group by aa ";


      // tableEnvironment.executeSql(s4);
        Table table = tableEnvironment.sqlQuery(s4);

        //GroupedTable aa = table.groupBy("aa");
       // aa.aggregate()

    // 这是  toRetractStream  
    DataStream<Tuple2<Boolean, Row>> tuple2DataStream = tableEnvironment.toRetractStream(table, Row.class);
        tuple2DataStream.addSink(new RichSinkFunction<Tuple2<Boolean, Row>>() {
            @Override
            public void invoke(Tuple2<Boolean, Row> value, Context context) throws Exception {
                Thread.sleep(10);
                System.out.println(value);
            }
        }).setParallelism(3).name("zcSink");




        /*
            这个是构造 某个task 的cpu 负载过多,造成某个task 被压 ,只是toAppendStream
        DataStream<Row> rowDataStream = tableEnvironment.toAppendStream(table, Row.class);

        SingleOutputStreamOperator<String> map = rowDataStream.map(new RichMapFunction<Row, String>() {

            public boolean process;

            @Override
            public void open(Configuration parameters) throws Exception {
                int indexOfThisSubtask = getRuntimeContext().getIndexOfThisSubtask();
                if(indexOfThisSubtask==0){
                    process=true;
                }else {
                    process=false;
                }
            }

            @Override
            public String map(Row row) throws Exception {

                if(process){

                for (int i = 0; i < 100000; i++) {
                    //MessageDigest.getInstance("nd").digest(row.toString().getBytes());
                    String s="1";
                     s=s+1;
                }}
                return "222";
            }
        }).setParallelism(3);

        map.print().setParallelism(3);
*/


        System.out.println(FlinkTest.class.getSimpleName());
        System.out.println(FlinkTest.class.getName());


        env.execute(FlinkTest.class.getSimpleName());
       // tableEnvironment.execute(FlinkTest.class.getSimpleName());




    }
### 优化Flink SQL以减少任务数量或处理大量任务 #### 减少SQL任务的数量 为了有效管理并减少Flink SQL任务的数量,可以通过合并相似的任务来实现。当多个查询操作目标相同的数据集时,考虑将它们组合成单一的复杂查询而不是多个简单的查询。 对于`FlinkInsertTask`类中的定义[^1],如果存在多个子类分别向不同的表插入数据,则应评估这些插入操作能否被整合在一起。例如: ```scala class CombinedInsertTasks(tableName: String, sqlParts: Seq[String]) extends FlinkInsertTask { override def run(): Unit = { val combinedSql = sqlParts.mkString(", ") flink.addInsertSql(s"INSERT INTO $tableName ($combinedSql)") } override def sql: String = ??? } ``` 通过这种方式可以在一次调用中完成多项插入工作,从而减少了整体的任务数。 #### 处理大量任务的最佳实践 针对批处理模式下的性能问题[^2],启用缩功能能够显著改善磁盘I/O效率。具体设置如下所示: ```properties taskmanager.network.blocking-shuffle.compression.enabled=true ``` 此外,在作业启动前利用特定的方法对环境进行预处理也非常重要[^3]。这可能涉及到清理旧的日志文件或其他不必要的资源释放动作,以便为新的大规模计算腾出空间。 最后,理解内部执行流程有助于更好地设计应用程序结构[^4]。例如,知道`executeSql()`最终会调用`executeInternal()`可以帮助开发者更合理地安排SQL语句及其依赖关系,进而提高整个系统的吞吐量和响应速度。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值