flume自定义 file sink 以本地时间,event数据为目录

本文介绍了一种使用Apache Flume进行日志数据收集的方法,并实现了根据数据内容及时间进行自动分类存储的功能。通过自定义Flume Sink组件,能够将不同类型的日志按日期和小时归档到指定目录。

参考:https://www.cnblogs.com/sunyaxue/p/6645753.html

数据收集为file时,不能以本地时间和数据内容分类接入文件,参考别人的文章实现

代码:

package flume;

import org.apache.avro.util.Utf8;
import org.apache.flume.*;
import org.apache.flume.conf.Configurable;
import org.apache.flume.sink.AbstractSink;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.*;
import java.text.SimpleDateFormat;
import java.util.Date;

public class MySinks extends AbstractSink implements Configurable {
    private static final Logger logger = LoggerFactory.getLogger(MySinks.class);
    private static final String PROP_KEY_ROOTPATH = "sink.directory";
    private String fileName;
    private String filePath;
    private File path;
//    private static final SimpleDateFormat timeFormater = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    private static final SimpleDateFormat timeFormater = new SimpleDateFormat("yyyy-MM-dd");
    private static final SimpleDateFormat timeFormater1 = new SimpleDateFormat("HH");

    @Override
    public void configure(Context context) {
        filePath = context.getString(PROP_KEY_ROOTPATH);
    }

    @Override
    public Status process() throws EventDeliveryException {
        Channel ch = getChannel();
        //get the transaction
        Transaction txn = ch.getTransaction();
        Event event = null;
        //begin the transaction
        txn.begin();
        while (true) {
            event = ch.take();
            if (event != null) {
                break;
            }
        }
        try {

            logger.debug("Get event.");

            String body = new String(event.getBody());
            String res = body + "\r\n";
            String logType = body.substring(body.lastIndexOf("|") + 1);
            String dayTime = timeFormater.format(new Date());
            String hourTime = timeFormater1.format(new Date());
            path = new File(filePath+"/" + dayTime + "/" + hourTime);
            if (!path.exists()) {
                path.mkdirs();
            }
            fileName = path +"/"+ logType;
            File file = new File(fileName);
            if (!file.exists()) {
                file.createNewFile();
            }
            FileOutputStream fos = null;
            BufferedWriter pw=null;
            try {
                fos = new FileOutputStream(file, true);
                OutputStreamWriter osw = new OutputStreamWriter(fos,"Utf8");
                pw = new BufferedWriter(osw);
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }
            try {
                pw.write(new String(res.getBytes("iso-8859-1"), "utf-8"));
            } catch (IOException e) {
                e.printStackTrace();
            }
            try {
                pw.close();
                fos.close();

            } catch (IOException e) {
                e.printStackTrace();
            }
            txn.commit();
            return Status.READY;
        } catch (Throwable th) {
            txn.rollback();

            if (th instanceof Error) {
                throw (Error) th;
            } else {
                throw new EventDeliveryException(th);
            }
        } finally {
            txn.close();
        }
    }


}





conf:

#flume flile
agent.sources=httpSrc
agent.channels=c1
agent.sinks=k1




agent.sources.httpSrc.type=http
agent.sources.httpSrc.bind=172.16.90.62
agent.sources.httpSrc.port=55555
agent.sources.httpSrc.channels=c1


agent.sources.httpSrc.interceptors = i2
agent.sources.httpSrc.interceptors.i2.type = flume.LogAnalysis$Builder
agent.sources.httpSrc.interceptors.i2.regex = ([^+]*)log_
agent.sources.httpSrc.interceptors.i2.serializers = s1 s2 
agent.sources.httpSrc.interceptors.i2.serializers.s1.name = data
agent.sources.httpSrc.interceptors.i2.serializers.s2.name = type



agent.channels.c1.type=memory
agent.channels.c1.capacity=100000
agent.channels.c1.transactionCapacity=100000




agent.sinks.k1.type = flume.MySinks
agent.sinks.k1.sink.directory=/data/flume_data/
agent.sinks.k1.channel=c1
agent.sinks.k1.sink.rollInterval=300

Flume 中实现自定义 Doris Sink 的开发与配置,主要涉及编写一个继承 `AbstractSink` 类的 Java 类,并在其中实现将数据写入 Doris 的逻辑。以下是具体的开发和配置步骤: ### 1. 自定义 Doris Sink 开发 首先需要创建一个 Java 类,继承 `org.apache.flume.sink.AbstractSink`,并重写其 `process()` 方法以实现数据发送到 Doris 的功能。 ```java import org.apache.flume.*; import org.apache.flume.conf.Configurable; import org.apache.flume.event.SimpleEvent; import org.apache.http.HttpResponse; import org.apache.http.client.methods.HttpPost; import org.apache.http.entity.StringEntity; import org.apache.http.impl.client.CloseableHttpClient; import org.apache.http.impl.client.HttpClients; import java.io.IOException; import java.util.Base64; public class DorisSink extends AbstractSink implements Configurable { private String dorisUrl; private String username; private String password; private String format = "json"; private boolean readJsonByLine = true; private boolean loadToSingleTablet = true; @Override public void configure(Context context) { dorisUrl = context.getString("doris.url"); username = context.getString("doris.username"); password = context.getString("doris.password"); if (context.containsKey("format")) { format = context.getString("format"); } if (context.containsKey("read_json_by_line")) { readJsonByLine = Boolean.parseBoolean(context.getString("read_json_by_line")); } if (context.containsKey("load_to_single_tablet")) { loadToSingleTablet = Boolean.parseBoolean(context.getString("load_to_single_tablet")); } } @Override public Status process() throws EventDeliveryException { Status status = Status.READY; Channel channel = getChannel(); Event event = channel.take(); if (event != null) { String jsonLine = new String(event.getBody()); try (CloseableHttpClient client = HttpClients.createDefault()) { HttpPost post = new HttpPost(dorisUrl); // 设置 Basic Auth 头 String auth = username + ":" + password; String encodedAuth = Base64.getEncoder().encodeToString(auth.getBytes()); post.setHeader("Authorization", "Basic " + encodedAuth); // 设置格式头 post.setHeader("format", format); post.setHeader("read_json_by_line", String.valueOf(readJsonBy_line)); post.setHeader("load_to_single_tablet", String.valueOf(loadToSingleTablet)); // 设置请求体 post.setEntity(new StringEntity(jsonLine)); HttpResponse response = client.execute(post); if (response.getStatusLine().getStatusCode() != 200) { status = Status.BACKOFF; } } catch (IOException e) { throw new EventDeliveryException("Failed to send data to Doris", e); } } else { status = Status.BACKOFF; } return status; } } ``` ### 2. 构建与部署 - 将上述代码编译打包为 JAR 文件。 - 将该 JAR 包放置在 Flume 的 `lib` 目录下,以便 Flume 能够加载该自定义 Sink。 ### 3. 配置 Flume Agent 在 Flume 配置文件中添加如下内容: ```properties # 定义 agent 名称、sources、channels、sinks a1.sources = r1 a1.channels = c1 a1.sinks = k1 # 配置 source(例如从 Kafka 读取) a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource a1.sources.r1.topic = test_topic a1.sources.r1.bootstrap.servers = kafka-broker1:9092 a1.sources.r1.group.id = flume-consumer-group a1.sources.r1.auto.offset.reset = latest a1.sources.r1.key.deserializer = org.apache.kafka.common.serialization.StringDeserializer a1.sources.r1.value.deserializer = org.apache.kafka.common.serialization.StringDeserializer # 配置 channel a1.channels.c1.type = memory a1.channels.c1.capacity = 10000 a1.channels.c1.transactionCapacity = 1000 # 配置 sink 使用自定义的 Doris Sink a1.sinks.k1.type = com.example.DorisSink a1.sinks.k1.doris.url = http://fe_host:fe_http_port/api/log_db/log_table/_stream_load a1.sinks.k1.doris.username = your_username a1.sinks.k1.doris.password = your_password a1.sinks.k1.format = json a1.sinks.k1.read_json_by_line = true a1.sinks.k1.load_to_single_tablet = true # 绑定组件 a1.sources.r1.channels = c1 a1.sinks.k1.channel = c1 ``` ### 4. 启动 Flume Agent 使用如下命令启动 Flume Agent: ```bash /export/servers/flume-1.8.0/bin/flume-ng agent \ --conf /export/servers/flume-1.8.0/conf \ --conf-file /path/to/your/flume-config.properties \ --name a1 -Dflume.root.logger=INFO,console ``` 通过以上步骤,即可完成 Flume 自定义 Doris Sink 的开发与配置工作[^1]。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

一天两晒网

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值