Dinky与DolphinScheduler集成实践指南-优快云博客

本文链接：https://blog.youkuaiyun.com/gitblog_00048/article/details/148578301

Dinky与DolphinScheduler集成实践指南

dinky Dinky is an out-of-the-box, one-stop, real-time computing platform dedicated to the construction and practice of Unified Streaming & Batch and Unified Data Lake & Data Warehouse. Based on Apache Flink, Dinky provides the ability to connect many big data frameworks including OLAP and Data Lake. 项目地址: https://gitcode.com/gh_mirrors/di/dinky

前言

在现代大数据处理场景中，任务调度系统是不可或缺的基础设施。本文将详细介绍如何将Dinky与DolphinScheduler进行深度集成，实现Flink任务的自动化调度管理。通过这种集成，用户可以充分利用Dinky强大的Flink开发能力和DolphinScheduler优秀的调度能力，构建高效的数据处理流水线。

环境准备

系统要求

DolphinScheduler：3.2.1及以上版本
Dinky：1.0.0及以上版本
Docker：19.03及以上版本（用于快速部署DolphinScheduler）

DolphinScheduler部署

对于快速体验环境，推荐使用Docker方式部署DolphinScheduler：

export DOLPHINSCHEDULER_VERSION=3.2.1
docker run --name dolphinscheduler-standalone-server \
           -p 12345:12345 \
           -p 25333:25333 \
           -d apache/dolphinscheduler-standalone-server:"${DOLPHINSCHEDULER_VERSION}"

部署完成后，通过浏览器访问http://<服务器IP>:12345/dolphinscheduler/ui/login，使用默认账号admin/dolphinscheduler123登录。

基础配置

创建租户：在"安全中心"→"租户管理"中创建新租户
用户管理：确保操作Dinky的用户已加入对应租户
生成Token：在"令牌管理"中创建API访问令牌，注意设置合理的过期时间

Dinky配置

服务启动

假设Dinky安装在/opt/dinky-1.0.0目录：

cd /opt/dinky-1.0.0
./auto.sh start 1.16

访问http://<服务器IP>:8888，使用默认账号admin/admin登录。

调度系统集成配置

进入"配置中心"→"全局配置"
选择"DolphinScheduler配置"标签页
填写以下关键配置项：
- 服务地址：http://<DS服务器IP>:12345/dolphinscheduler
- API Token：先前生成的Token
- 项目名称：默认为Dinky，可按需修改
- 启用开关：设置为"是"

任务开发与调度实践

FlinkSQL任务示例

以下是一个完整的FlinkSQL任务示例，包含数据生成和窗口计算：

-- 检查点配置
set execution.checkpointing.checkpoints-after-tasks-finish.enabled=true;
SET pipeline.operator-chaining=false;
set state.backend.type=rocksdb;
set execution.checkpointing.interval=8000;
set state.checkpoints.num-retained=10;
set cluster.evenly-spread-out-slots=true;

-- 创建数据源表
DROP TABLE IF EXISTS source_table3;
CREATE TABLE IF NOT EXISTS
  source_table3 (
    `order_id` BIGINT,
    `product` BIGINT,
    `amount` BIGINT,
    `order_time` as CAST(CURRENT_TIMESTAMP AS TIMESTAMP(3)), 
    WATERMARK FOR order_time AS order_time - INTERVAL '2' SECOND
  )
WITH
  (
    'connector' = 'datagen',
    'rows-per-second' = '1',
    'fields.order_id.min' = '1',
    'fields.order_id.max' = '2',
    'fields.amount.min' = '1',
    'fields.amount.max' = '10',
    'fields.product.min' = '1',
    'fields.product.max' = '2'
  );

-- 创建结果表
DROP TABLE IF EXISTS sink_table5;
CREATE TABLE IF NOT EXISTS
  sink_table5 (
    `product` BIGINT,
    `amount` BIGINT,
    `order_time` TIMESTAMP(3),
    `one_minute_sum` BIGINT
  )
WITH
  ('connector' = 'print');

-- 执行窗口计算
INSERT INTO
  sink_table5
SELECT
  product,
  amount,
  order_time,
  SUM(amount) OVER (
    PARTITION BY
      product
    ORDER BY
      order_time
      RANGE BETWEEN INTERVAL '1' MINUTE PRECEDING
      AND CURRENT ROW
  ) as one_minute_sum
FROM
  source_table3;