背景
我们数据加工使用海豚调度DolpinScheduler做任务调度。流程是 提交一个shell 到海豚,shell里使用
spark-submit 提交 java 程序,然后使用海豚的api 去获取日志(海豚管理端也能看到,我们换皮定制化了),但是在集群模式下,获取到的日志是spark运行的日志(拉取jar,accept,running之类的),我们需要的是jar包里打印的log.info()这一类的,在jar包里打印了任务进行到流程的哪个阶段,这个日志在hadoop里能看,海豚调度里看不了,又不想大费周章去部署logtash搞elk,或者去配flume,然后我们就想着把日志输出到ES里,能看能查。之前的业务日志又全都是log.info打印的,又懒得去改造,调研后发现log4j2可以自定义Appender,那么咱们今天就搞定他。
在海豚里提交的shell脚本
下面这个脚本我们换皮搞定制化的提交到海豚的
/home/software/spark-3.1.2/bin/spark-submit --class cnki.bdms.servicespark.BdcServiceSparkApplication \
--conf spark.yarn.jars="hdfs://cluster1:8020/bdclib/*" \
--driver-java-options "-Dspark.yarn.dist.files=/home/software/hadoop-3.3.0/etc/hadoop/yarn-site.xml" \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 3g \
--executor-cores 3 \
--num-executors 3 \
--conf spark.yarn.maxAppAttempts=5 \
--conf spark.yarn.preserve.staging.files=true \
/data/jar/bdcServiceSpark-1.0.0.jar 3627
自定义EsAppender
先引入pom,我们提交的是springboot 的jar包
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-log4j2</artifactId>
</dependency>
EsAppender实现
package cnki.bdms.servicespark.utils;
import com.fasterxml.jackson.core.JsonProcessingException;
import org.apache.http.HttpHost;
import org.apache.http.auth.AuthScope;
import org.apache.http.auth.UsernamePasswordCredentials;
import org.apache.http.client.CredentialsProvider;
import org.apache.http.impl.client.BasicCredentialsProvider;
import org.apache.http.impl.nio.reactor.IOReactorConfig;
import org.apache.logging.log4j.core.Filter;
import org.apache.logging.log4j.core.Layout;
import org.apache.logging.log4j.core.LogEvent;
import org.apache.logging.log4j.core.appender.AbstractAppender;
import org.apache.logging.log4j.core.config.plugins.Plugin;
import org.apache.logging.log4j.core.config.plugins.PluginAttribute;
import org.apache.logging.log4j.core.config.plugins.PluginElement;
import org.apache.logging.log4j.core.config.plugins.PluginFactory;
import org.apache.logging.log4j.core.layout.PatternLayout;
import org.apache.logging.log4j.util.ReadOnlyStringMap;
import org.elasticsearch.action.bulk.BulkRequest;
import org.elasticsearch.action.index.IndexRequest;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestClient;
import org.elasticsearch.client.RestClientBuilder;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.MDC;
import org.springframework.util.CollectionUtils;
import java.io.IOException;
import java.io.Serializable;
import java.net.InetAddress;
import java.net.NetworkInterface;
import java.net.UnknownHostException;
import java.util.*;
@Plugin(name = "elasticSearchAppender", category = "Core", elementType = "appender", printObject = true)
public class EsAppender extends AbstractAppender {
private String address;

本文介绍如何在Spark作业中使用自定义EsAppender,通过log4j2将log.info级别的日志发送到Elasticsearch,避免繁琐的日志管理和部署,简化集群监控。
最低0.47元/天 解锁文章
438

被折叠的 条评论
为什么被折叠?



