Kettle任务并发执行导致日志表行锁问题

在运行kettle8.3和Pentaho-Server8.3时,如果启用日志并并发执行多个任务,可能会遇到因行锁导致的任务执行失败。错误信息显示为SQLTransactionRollbackException,原因是并发任务在插入BATCH_ID时产生冲突。为了解决这个问题,可以采取两种方案:一是创建一个名为LOG_COUNTER的额外ID生成表,通过UPDATE语句自动递增ID;二是对于支持序列的数据库(如Oracle),创建序列并在数据库连接选项中指定。这两种方法都能确保并发任务的BATCH_ID不重复,避免行锁问题。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

官网文档:

https://wiki.pentaho.com/display/EAI/Configuring+log+tables+for+concurrent+access?showComments=false

本文档由官网文档整理得出,请参阅官网文档,但是国内访问pentaho网站不太稳定,特此整理

环境:

kettle 8.3和Pentaho-Server 8.3

背景:

使用kettle 的服务器端Pentaho-Server ,做定时任务时,发现假如多个任务同时并发,并启用了kettle自带的日志时,会因为日志表的行锁造成任务执行失败。

问题特征:

前端发现数据异常,后台检查Pentaho-Server后台日志,发现如下报错:

2021-04-01 00:10:03.729 ERROR <Thread-134959> [/home/LGETL/DMS/job/JOB_DMS_MAC_TMP_NEW.kjb]  Unable to write log record to log table [log_job1]
2021-04-01 00:10:03.729 ERROR <Thread-134959> [/home/LGETL/DMS/job/JOB_DMS_MAC_TMP_NEW.kjb]  Unable to write log record to log table [log_job1]
2021-04-01 00:10:03.730 ERROR <Thread-134959> [/home/LGETL/DMS/job/JOB_DMS_MAC_TMP_NEW.kjb]  org.pentaho.di.core.exception.KettleDatabaseException: 
Error inserting/updating row
(conn=7837) Deadlock found when trying to get lock; try restarting transaction

        at org.pentaho.di.core.database.Database.insertRow(Database.java:1324)
        at org.pentaho.di.core.database.Database.insertRow(Database.java:1248)
        at org.pentaho.di.core.database.Database.insertRow(Database.java:1236)
        at org.pentaho.di.core.database.Database.insertRow(Database.java:1224)
        at org.pentaho.di.core.database.Database.insertRow(Database.java:1187)
        at org.pentaho.di.core.database.Database.writeLogRecord(Database.java:3598)
        at org.pentaho.di.job.Job.beginProcessing(Job.java:1009)
        at org.pentaho.di.job.Job.execute(Job.java:472)
        at org.pentaho.di.job.Job.run(Job.java:384)
Caused by: java.sql.SQLTransactionRollbackException: (conn=7837) Deadlock found when trying to get lock; try restarting transaction
        at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.get(ExceptionMapper.java:179)
        at org.mariadb.jdbc.internal.util.exceptions.ExceptionMapper.getException(ExceptionMapper.java:110)
        at org.mariadb.jdbc.MariaDbStatement.executeExceptionEpilogue(MariaDbStatement.java:228)
        at org.mariadb.jdbc.MariaDbPreparedStatementClient.executeInternal(MariaDbPreparedStatementClient.java:216)
        at org.mariadb.jdbc.MariaDbPreparedStatementClient.execute(MariaDbPreparedStatementClient.java:150)
        at org.mariadb.jdbc.MariaDbPreparedStatementClient.executeUpdate(MariaDbPreparedStatementClient.java:183)
        at org.pentaho.di.core.database.Database.insertRow(Database.java:1291)
        ... 8 more
Caused by: java.sql.SQLException: Deadlock found when trying to get lock; try restarting transaction
Query is: INSERT INTO log_job1 (ID_JOB, CHANNEL_ID, JOBNAME, STATUS, LINES_READ, LINES_WRITTEN, LINES_UPDATED, LINES_INPUT, LINES_OUTPUT, LINES_REJECTED, ERRORS, STARTDATE, ENDDATE, LOGDATE, 
DEPDATE, REPLAYDATE, LOG_FIELD) VALUES ( ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?,  ?), parameters [4394,'e3fd0d67-5aca-44f0-a6e0-7714bd889213','JOB_DMS_MAC_TMP_NEW','sta
rt',<null>,<null>,<null>,<null>,<null>,<null>,<null>,'2021-03-31 00:10:02.0','2021-04-01 00:10:03.682','2021-04-01 00:10:03.682','2021-04-01 00:10:03.682','2021-04-01 00:10:03.682','2021/03/3
1 20:20:00 - RepositoriesMeta - Reading repositories XML file: FromInputStream

问题原因:

kettle设计时,未妥善处理并发任务,插入BATCH_ID值时,只是简单的select max(ID_JOB)+1,当同一时间内多个任务执行时,获取的新一个BATCH_ID是重复的,这样就造成了行锁。

解决方案:

由于kettle并未指定后台日志表的种类,针对不同的数据库,使用额外的BATCH ID生成表,或者序列来解决

方案1:使用额外的BATCH ID生成表

该方案适合无sequence类型的数据库,我们使用的MariaDB,同理MySQL也适用该方案

日志表的库中创建ID生成表LOG_COUNTER

create table LOG_COUNTER(ID INTEGER NOT NULL AUTO_INCREMENT PRIMARY KEY, NAME VARCHAR(100)) engine=InnoDB;

插入一行数据,注意有且仅有一行

insert into LOG_COUNTER(ID,NAME) values(100000,'LOG_TABLES');
commit;

--注意:这里的100000,是因为我之前的任务表中已经有25000行数据了,我打算让新的日志BATCH_ID从10万开始计数,总之一定要确保让这里的ID大于之前的日志表中最大BATCH_ID

然后修改kettle中日志表的数据库连接选项:

在选项中添加参数

参数名:AUTOINCREMENT_SQL_FOR_BATCH_ID

参数值:UPDATE LOG_COUNTER SET ID=LAST_INSERT_ID(ID+1)

原文中说:

This will explain to PDI to use a value from the LOG_COUNTER table every time a new batch ID needs to be generated for a transformation or a job table.

这将向 PDI 解释每次需要为转换或作业表生成新batch ID 时使用 LOG_COUNTER 表中的值

然后尝试在本地和SERVER端执行任务,可以发现每执行一次后,LOG_COUNTER中的ID值也会随之更新,同时日志表中的新的最大ID变成了LOG_COUNTER表中的当前ID值

方案2:使用序列Sequence 

 --未测试,直接放上原文,请结合源文档自己测试:适用于ORACLE等支持Sequence类型的日志表

You can use a database sequence if your database supports this.  To enable support for this, create a sequence like this (Oracle example):

CREATE SEQUENCE LOGGINGSEQ START WITH 10000;

In the logging database connection in Pentaho Data Integration (Spoon), add the following line in the Options panel:

Parameter: SEQUENCE_FOR_BATCH_ID

Value: LOGGINGSEQ

This will explain to PDI to use a value from the LOGGINGSEQ sequence every time a new batch ID needs to be generated for a transformation or a job table.

评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值