一、源码分析后结论:
1、由于taskGroupCommunicationMap是一个静态map,在standalone模式多个任务并行执行任务时,其数据会叠加,或者被重置(key都是taskGroupId相同时)
2、修改方案,将jobid作为taskGroupCommunicationMap的key,在注册和update和获取Communication时都加上jobid
3、关键需要修改updateTaskGroupCommunication和getJobCommunication方法
二、修改方法:
package com.alibaba.datax.core.statistics.communication;
import com.alibaba.datax.dataxservice.face.domain.enums.State;
import org.apache.commons.lang3.Validate;
import java.util.Iterator;
import java.util.Map;
import java.util.Set;
import java.util.concurrent.ConcurrentHashMap;
/**
* 1、由于taskGroupCommunicationMap是一个静态map,在standalone模式多个任务并行执行任务时,其数据会叠加,或者被重置(key都是taskGroupId相同时)
* 2、修改方案,将jobid作为taskGroupCommunicationMap的key,在注册和update和获取Communication时都加上jobid
* 3、关键需要修改updateTaskGroupCommunication和getJobCommunication方法
*/
public final class LocalTGCommunicationManager {
private static Map<Integer, Communication> taskGroupCommunicationMap =
new ConcurrentHashMap<Integer, Communication>();
public static void registerTaskGroupCommunication(
int jobId, Communication communication) {
taskGroupCommunicationMap.put(jobId, communication);
}
public static Communication getJobCommunication(Integer jobId) {
Communication communication = new Communication();
communication.setState(State.SUCCEEDED);
/*for (Communication taskGroupCommunication :
taskGroupCommunicationMap.values()) {
communication.mergeFrom(taskGroupCommunication);
}*/
communication.mergeFrom(taskGroupCommunicationMap.get(jobId));
return communication;
}
/**
* 采用获取taskGroupId后再获取对应communication的方式,
* 防止map遍历时修改,同时也防止对map key-value对的修改
*
* @return
*/
public static Set<Integer> getTaskGroupIdSet() {
return taskGroupCommunicationMap.keySet();
}
public static Communication getTaskGroupCommunication(int jobId) {
Validate.isTrue(jobId >= 0, "taskGroupId不能小于0");
return taskGroupCommunicationMap.get(jobId);
}
public static void updateTaskGroupCommunication(final int jobId,
final Communication communication) {
Validate.isTrue(taskGroupCommunicationMap.containsKey(
jobId), String.format("taskGroupCommunicationMap中没有注册taskGroupId[%d]的Communication," +
"无法更新该taskGroup的信息", jobId));
taskGroupCommunicationMap.put(jobId, communication);
}
public static void clear() {
taskGroupCommunicationMap.clear();
}
public static Map<Integer, Communication> getTaskGroupCommunicationMap() {
return taskGroupCommunicationMap;
}
}
提醒:关键在于datax.core.statistics.communication.LocalTGCommunicationManager类的修改,其中各个方法中入参jobId对应在调用类里面修改,尽量用super.getJobId()获取jobId,修改起来容易