文章目录
前言
本文对JobFailMonitorHelper的工作内容进行介绍。
一、JobFailMonitorHelper作用:
JobFailMonitorHelper是xxl-job-admin中的一个辅助类,用于监控任务执行失败情况并进行处理。其主要作用包括:
-
监控任务执行失败:JobFailMonitorHelper定时检测任务执行情况,发现任务执行失败的情况,例如任务执行超时、执行异常等。
-
处理任务执行失败:一旦发现任务执行失败,JobFailMonitorHelper会根据预设的处理策略来进行处理,例如重新执行任务、发送告警通知等。
-
统计和报表:JobFailMonitorHelper会统计任务执行失败的情况,生成报表并提供给管理员查看,以便对任务执行情况进行监控和分析。
-
增强系统健壮性:通过及时监控和处理任务执行失败情况,JobFailMonitorHelper能够提高系统的健壮性和可靠性,确保任务能够按时正确执行。
总的来说,JobFailMonitorHelper在xxl-job-admin中扮演着监控任务执行失败情况并进行处理的重要角色,帮助管理员及时发现和处理任务执行异常情况,提高系统的稳定性和可靠性。
二、JobFailMonitorHelper源码内容:
2.1 start() 初始化
2.1.1 任务失败重试:
// 定义log 对象
private static Logger logger = LoggerFactory.getLogger(JobFailMonitorHelper.class);
// 实例化 JobFailMonitorHelper 对象
private static JobFailMonitorHelper instance = new JobFailMonitorHelper();
public static JobFailMonitorHelper getInstance(){
return instance;
}
// ---------------------- monitor ----------------------
// 定义监控线程 monitorThread
private Thread monitorThread;
// 任务while 循环标识
private volatile boolean toStop = false;
public void start(){
monitorThread = new Thread(new Runnable() {
@Override
public void run() {
// monitor
while (!toStop) {
try {
// 获取最近的 1000 条任务执行失败的任务id 集合
/**
* SELECT id FROM `xxl_job_log`
WHERE !(
(trigger_code in (0, 200) and handle_code = 0)
OR
(handle_code = 200)
)
AND `alarm_status` = 0
ORDER BY id ASC
LIMIT #{pagesize}
**/
List<Long> failLogIds = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().findFailJobLogIds(1000);
if (failLogIds!=null && !failLogIds.isEmpty()) {
for (long failLogId: failLogIds) {
// 遍历失败的任务id 集合
// lock log 乐观锁占用
/**
* UPDATE xxl_job_log
SET
`alarm_status` = #{newAlarmStatus}
WHERE `id`= #{logId} AND `alarm_status` = #{oldAlarmStatus}
**/
int lockRet = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, 0, -1);
if (lockRet < 1) {
// 锁抢占失败,绩效下个任务遍历
continue;
}
// 获取任务执行的log 对象
/**
* SELECT <include refid="Base_Column_List" />
FROM xxl_job_log AS t
WHERE t.id = #{id}
**/
XxlJobLog log = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().load(failLogId);
// 获取 任务的详情
/**
* SELECT <include refid="Base_Column_List" />
FROM xxl_job_info AS t
WHERE t.id = #{id}
**/
XxlJobInfo info = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(log.getJobId());
// 1、fail retry monitor 判断任务的失败重试次数
if (log.getExecutorFailRetryCount() > 0) {
// 失败重试次数大于0 触发任务重试,每次重试都将任务的 重试次数-1
/**
* 任务的触发执行细节可以参考 文章:
* 原理篇-- 定时任务xxl-job-服务端(admin)项目启动过程--JobTriggerPoolHelper 初始化 (3)
* 连接: https://blog.youkuaiyun.com/l123lgx/article/details/136349951
**/
JobTriggerPoolHelper.trigger(log.getJobId(), TriggerTypeEnum.RETRY, (log.getExecutorFailRetryCount()-1), log.getExecutorShardingParam(), log.getExecutorParam(), null);
String retryMsg = "<br><br><span style=\"color:#F39C12;\" > >>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_type_retry") +"<<<<<<<<<<< </span><br>";
// 追加任务重试的执行结果
log.setTriggerMsg(log.getTriggerMsg() + retryMsg);
// 更新log 对象信息
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(log);
}
// 2、fail alarm monitor 失败告警信息发送
int newAlarmStatus = 0; // 告警状态:0-默认、-1=锁定状态、1-无需告警、2-告警成功、3-告警失败
if (info != null) {
// 执行告警业务
boolean alarmResult = XxlJobAdminConfig.getAdminConfig().getJobAlarmer().alarm(info, log);
// 标识告警的结果
newAlarmStatus = alarmResult?2:3;
} else {
// 无需告警
newAlarmStatus = 1;
}
// 更新log 对象的告警信息
/**
* UPDATE xxl_job_log
SET
`alarm_status` = #{newAlarmStatus}
WHERE `id`= #{logId} AND `alarm_status` = #{oldAlarmStatus}
**/
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, -1, newAlarmStatus);
}
}
} catch (Exception e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job fail monitor thread error:{}", e);
}
}
try {
TimeUnit.SECONDS.sleep(10);
} catch (Exception e) {
if (!toStop) {
logger.error(e.getMessage(), e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, job fail monitor thread stop");
}
});
// 设置 monitorThread线程 守护线程,名字,运行
monitorThread.setDaemon(true);
monitorThread.setName("xxl-job, admin JobFailMonitorHelper");
monitorThread.start();
}
2.1.2 任务失败告警信息发送:
XxlJobAdminConfig.getAdminConfig().getJobAlarmer().alarm(info, log) 获取所有实现了JobAlarm 接口的bean 遍历调用 doAlarm 方法;
2.1.2.1 JobAlarmer 告警类:
1)JobAlarmer 的初始化:
// @Component 标识被spring 识别 构建 JobAlarmer 的bean 对象并放入到单例池中
@Component
public class JobAlarmer implements ApplicationContextAware, InitializingBean {
private static Logger logger = LoggerFactory.getLogger(JobAlarmer.class);
private ApplicationContext applicationContext;
private List<JobAlarm> jobAlarmList;
// 实现ApplicationContextAware 在容器创建后可以执行 setApplicationContext 注入容器的上下文
@Override
public void setApplicationContext(ApplicationContext applicationContext) throws BeansException {
this.applicationContext = applicationContext;
}
// 实现InitializingBean 在JobAlarmer 的bean 属性注入后,调用afterPropertiesSet 完成初始化
@Override
public void afterPropertiesSet() throws Exception {
// 从当前项目容器中获取到 JobAlarm类型的所有bean对象
Map<String, JobAlarm> serviceBeanMap = applicationContext.getBeansOfType(JobAlarm.class);
if (serviceBeanMap != null && serviceBeanMap.size() > 0) {
// 将bean 对象填充到 jobAlarmList 集合中
jobAlarmList = new ArrayList<JobAlarm>(serviceBeanMap.values());
}
}
}
2) JobAlarm 告警接口类:
public interface JobAlarm {
/**
* job alarm
*
* @param info
* @param jobLog
* @return
*/
public boolean doAlarm(XxlJobInfo info, XxlJobLog jobLog);
}
3)JobAlarm 的实现类 EmailJobAlarm:
目前服务端实现告警的类只有EmailJobAlarm 通过邮件告警,如果要扩展可以仿照EmailJobAlarm 实现 JobAlarm 重写 doAlarm 方法即可;
/**
* job alarm by email
*
* @author xuxueli 2020-01-19
*/
@Component
public class EmailJobAlarm implements JobAlarm {
private static Logger logger = LoggerFactory.getLogger(EmailJobAlarm.class);
/**
* fail alarm 执行告警逻辑
*
* @param jobLog
*/
@Override
public boolean doAlarm(XxlJobInfo info, XxlJobLog jobLog){
boolean alarmResult = true;
// send monitor email
if (info!=null && info.getAlarmEmail()!=null && info.getAlarmEmail().trim().length()>0) {
// 告警邮件地址不为空
// alarmContent
String alarmContent = "Alarm Job LogId=" + jobLog.getId();
if (jobLog.getTriggerCode() != ReturnT.SUCCESS_CODE) {
alarmContent += "<br>TriggerMsg=<br>" + jobLog.getTriggerMsg();
}
if (jobLog.getHandleCode()>0 && jobLog.getHandleCode() != ReturnT.SUCCESS_CODE) {
alarmContent += "<br>HandleCode=" + jobLog.getHandleMsg();
}
// email info
XxlJobGroup group = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().load(Integer.valueOf(info.getJobGroup()));
String personal = I18nUtil.getString("admin_name_full");
String title = I18nUtil.getString("jobconf_monitor");
String content = MessageFormat.format(loadEmailJobAlarmTemplate(),
group!=null?group.getTitle():"null",
info.getId(),
info.getJobDesc(),
alarmContent);
Set<String> emailSet = new HashSet<String>(Arrays.asList(info.getAlarmEmail().split(",")));
// 遍历邮件地址 发送信息
for (String email: emailSet) {
// make mail
try {
MimeMessage mimeMessage = XxlJobAdminConfig.getAdminConfig().getMailSender().createMimeMessage();
MimeMessageHelper helper = new MimeMessageHelper(mimeMessage, true);
helper.setFrom(XxlJobAdminConfig.getAdminConfig().getEmailFrom(), personal);
helper.setTo(email);
helper.setSubject(title);
helper.setText(content, true);
XxlJobAdminConfig.getAdminConfig().getMailSender().send(mimeMessage);
} catch (Exception e) {
logger.error(">>>>>>>>>>> xxl-job, job fail alarm email send error, JobLogId:{}", jobLog.getId(), e);
alarmResult = false;
}
}
}
return alarmResult;
}
/**
* load email job alarm template
*
* @return
*/
private static final String loadEmailJobAlarmTemplate(){
String mailBodyTemplate = "<h5>" + I18nUtil.getString("jobconf_monitor_detail") + ":</span>" +
"<table border=\"1\" cellpadding=\"3\" style=\"border-collapse:collapse; width:80%;\" >\n" +
" <thead style=\"font-weight: bold;color: #ffffff;background-color: #ff8c00;\" >" +
" <tr>\n" +
" <td width=\"20%\" >"+ I18nUtil.getString("jobinfo_field_jobgroup") +"</td>\n" +
" <td width=\"10%\" >"+ I18nUtil.getString("jobinfo_field_id") +"</td>\n" +
" <td width=\"20%\" >"+ I18nUtil.getString("jobinfo_field_jobdesc") +"</td>\n" +
" <td width=\"10%\" >"+ I18nUtil.getString("jobconf_monitor_alarm_title") +"</td>\n" +
" <td width=\"40%\" >"+ I18nUtil.getString("jobconf_monitor_alarm_content") +"</td>\n" +
" </tr>\n" +
" </thead>\n" +
" <tbody>\n" +
" <tr>\n" +
" <td>{0}</td>\n" +
" <td>{1}</td>\n" +
" <td>{2}</td>\n" +
" <td>"+ I18nUtil.getString("jobconf_monitor_alarm_type") +"</td>\n" +
" <td>{3}</td>\n" +
" </tr>\n" +
" </tbody>\n" +
"</table>";
return mailBodyTemplate;
}
}
2.1.2.2 alarm 告警信息发送:
获取所有实现了JobAlarm 接口的bean 遍历调用 doAlarm 方法;
public boolean alarm(XxlJobInfo info, XxlJobLog jobLog) {
// List<JobAlarm> jobAlarmList 遍历 所有实现了JobAlarm 接口的bean
boolean result = false;
if (jobAlarmList!=null && jobAlarmList.size()>0) {
result = true; // success means all-success
for (JobAlarm alarm: jobAlarmList) {
boolean resultItem = false;
try {
resultItem = alarm.doAlarm(info, jobLog);
} catch (Exception e) {
logger.error(e.getMessage(), e);
}
if (!resultItem) {
result = false;
}
}
}
返回告警结果
return result;
}
2.2 toStop() 终止线程释放资源:
public void toStop(){
toStop = true;
// interrupt and wait
monitorThread.interrupt();
try {
monitorThread.join();
} catch (InterruptedException e) {
logger.error(e.getMessage(), e);
}
}
总结
本文对 JobFailMonitorHelper的工作内容进行介绍。