Using Flume要点

Flume是一种可靠且可扩展的大数据传输系统,用于平衡数据生产者与消费者之间的数据流动。它能够解决数据存储过程中的复杂问题,如并发写入、系统压力及网络延迟等,并提供稳定的数据流状态。本文详细介绍Flume的组成、特点及其应用场景。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

flume是什么

  • 可靠的,可扩展的大数据搬运系统,从数据生产者到数据最终目的地之间充当缓冲的角色,平衡数据生产者和消费者,提供稳定的流状态。

  • 主要的目的地以hdsf以及hbase为主。

  • 类似的有apache kafka以及facebook的scribe。

为什么要使用flume

  • 把数据存储到hdfs或者hbase并不是简单地调用api那样简单,这里得考虑各种复杂的场景,比如并发写入的量以及hdfs及hbase的系统压力,网络延迟等等问题。

  • flume设计为灵活的分布式系统,提供可定制化的管道,保证不会丢失数据,提供持久的channel

flume的构成

agent为其基本单元(每个agent包括source、channel、sink三大部分)

图片描述

source,负责捕获数据到agent

  • source拦截器,修改或删除事件

Avro Source
Exec Source
Spooling Directory Source
NetCat Source
Sequence Generator Source
Syslog Sources
Syslog TCP Source
Multiport Syslog TCP Source
Syslog UDP Source
HTTP Source

channel,一个缓冲区,负责在成功写入数据到sink之前,保存source已经接收的数据

  • channel过滤器/选择器(对事件应用过滤条件,决定事件应该写入到哪个source附带的channel中)

  • 内置channel

Memory Channel
File Channel
JDBC Channel

  • channel处理器(处理事件写入channel)

    图片描述

sink,负责从channel移走数据到目的地或下一个agent

  • sink运行器(事件处理分发)

  • sink组(包含多个sink)

  • sink处理器(从channel取数据写入到目的地)
    图片描述

  • 内置sink

HDFS Sink
Logger Sink
Avro Sink
IRC Sink
File Roll Sink
Null Sink
HBaseSinks
ElasticSearchSink

事件

flume把数据表示为事件,事件包括byte数组的主体以及map形式的报头(路由信息)
图片描述

拦截器

图片描述

  • 内置拦截器

Timestamp Interceptor
Host Interceptor
Static Interceptor
UUID Interceptor
Morpline Interceptor
Regex Filtering Interceptor
Regex Extractor Interceptor

适用场景

  • 数据可以表示为多个独立记录

  • 实时推送持续而且量级很大的数据流(如果每几个小时有几G的数据,不损害hdfs,没必要部署flume)

[root@hadoop01 apache-hive-3.1.3-bin]# hive which: no hbase in (/export/servers/hadoop-3.3.5/bin::/export/servers/apache-hive-3.1.3-bin/bin:/export/servers/flume-1.9.0/bin::/export/servers/apache-hive-3.1.3-bin/bin:/export/servers/flume-1.9.0/bin:/export/servers/flume-1.9.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/root/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/export/servers/jdk1.8.0_161/bin:/export/servers/hadoop-3.3.5/bin:/export/servers/hadoop-3.3.5/sbin:/export/servers/scala-2.12.10/bin:/root/bin) 2025-06-17 20:45:39,133 INFO conf.HiveConf: Found configuration file file:/export/servers/apache-hive-3.1.3-bin/conf/hive-site.xml Hive Session ID = 7a465677-eec4-40ee-b6a2-5c7b638725a7 2025-06-17 20:45:43,011 INFO SessionState: Hive Session ID = 7a465677-eec4-40ee-b6a2-5c7b638725a7 Logging initialized using configuration in jar:file:/export/servers/apache-hive-3.1.3-bin/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true 2025-06-17 20:45:43,144 INFO SessionState: Logging initialized using configuration in jar:file:/export/servers/apache-hive-3.1.3-bin/lib/hive-common-3.1.3.jar!/hive-log4j2.properties Async: true 2025-06-17 20:45:45,757 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7a465677-eec4-40ee-b6a2-5c7b638725a7 2025-06-17 20:45:45,806 INFO session.SessionState: Created local directory: /tmp/root/7a465677-eec4-40ee-b6a2-5c7b638725a7 2025-06-17 20:45:45,820 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/7a465677-eec4-40ee-b6a2-5c7b638725a7/_tmp_space.db 2025-06-17 20:45:45,850 INFO conf.HiveConf: Using the default value passed in for log id: 7a465677-eec4-40ee-b6a2-5c7b638725a7 2025-06-17 20:45:45,850 INFO session.SessionState: Updating thread name to 7a465677-eec4-40ee-b6a2-5c7b638725a7 main 2025-06-17 20:45:47,956 INFO metastore.HiveMetaStore: 0: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 2025-06-17 20:45:48,005 WARN metastore.ObjectStore: datanucleus.autoStartMechanismMode is set to unsupported value null . Setting it to value: ignored 2025-06-17 20:45:48,017 INFO metastore.ObjectStore: ObjectStore, initialize called 2025-06-17 20:45:48,020 INFO conf.MetastoreConf: Found configuration file file:/export/servers/apache-hive-3.1.3-bin/conf/hive-site.xml 2025-06-17 20:45:48,023 INFO conf.MetastoreConf: Unable to find config file hivemetastore-site.xml 2025-06-17 20:45:48,023 INFO conf.MetastoreConf: Found configuration file null 2025-06-17 20:45:48,024 INFO conf.MetastoreConf: Unable to find config file metastore-site.xml 2025-06-17 20:45:48,024 INFO conf.MetastoreConf: Found configuration file null 2025-06-17 20:45:48,400 INFO DataNucleus.Persistence: Property datanucleus.cache.level2 unknown - will be ignored 2025-06-17 20:45:48,953 INFO hikari.HikariDataSource: HikariPool-1 - Starting... 2025-06-17 20:45:49,485 INFO hikari.HikariDataSource: HikariPool-1 - Start completed. 2025-06-17 20:45:49,573 INFO hikari.HikariDataSource: HikariPool-2 - Starting... 2025-06-17 20:45:49,644 INFO hikari.HikariDataSource: HikariPool-2 - Start completed. 2025-06-17 20:45:50,533 INFO metastore.ObjectStore: Setting MetaStore object pin classes with hive.metastore.cache.pinobjtypes="Table,StorageDescriptor,SerDeInfo,Partition,Database,Type,FieldSchema,Order" 2025-06-17 20:45:50,824 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 2025-06-17 20:45:50,827 INFO metastore.ObjectStore: Initialized ObjectStore 2025-06-17 20:45:51,223 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:51,224 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:51,225 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:51,225 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:51,225 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:51,225 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:54,606 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:54,607 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:54,608 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:54,608 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:54,609 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:54,609 WARN DataNucleus.MetaData: Metadata has jdbc-type of null yet this is not valid. Ignored 2025-06-17 20:45:59,277 WARN metastore.ObjectStore: Version information not found in metastore. metastore.schema.verification is not enabled so recording the schema version 3.1.0 2025-06-17 20:45:59,278 WARN metastore.ObjectStore: setMetaStoreSchemaVersion called but recording version is disabled: version = 3.1.0, comment = Set by MetaStore root@192.168.245.131 2025-06-17 20:45:59,555 INFO metastore.HiveMetaStore: Added admin role in metastore 2025-06-17 20:45:59,559 INFO metastore.HiveMetaStore: Added public role in metastore 2025-06-17 20:45:59,663 INFO metastore.HiveMetaStore: No user is added in admin role, since config is empty 2025-06-17 20:46:00,035 INFO metastore.RetryingMetaStoreClient: RetryingMetaStoreClient proxy=class org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient ugi=root (auth:SIMPLE) retries=1 delay=1 lifetime=0 2025-06-17 20:46:00,086 INFO metastore.HiveMetaStore: 0: get_all_functions 2025-06-17 20:46:00,097 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_all_functions Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.Hive Session ID = b7eae93e-640d-4628-b883-4e088aafa6e6 2025-06-17 20:46:00,287 INFO CliDriver: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases. 2025-06-17 20:46:00,287 INFO SessionState: Hive Session ID = b7eae93e-640d-4628-b883-4e088aafa6e6 2025-06-17 20:46:00,331 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/b7eae93e-640d-4628-b883-4e088aafa6e6 2025-06-17 20:46:00,336 INFO session.SessionState: Created local directory: /tmp/root/b7eae93e-640d-4628-b883-4e088aafa6e6 2025-06-17 20:46:00,346 INFO session.SessionState: Created HDFS directory: /tmp/hive/root/b7eae93e-640d-4628-b883-4e088aafa6e6/_tmp_space.db 2025-06-17 20:46:00,354 INFO metastore.HiveMetaStore: 1: get_databases: @hive# 2025-06-17 20:46:00,355 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_databases: @hive# 2025-06-17 20:46:00,360 INFO metastore.HiveMetaStore: 1: Opening raw store with implementation class:org.apache.hadoop.hive.metastore.ObjectStore 2025-06-17 20:46:00,364 INFO metastore.ObjectStore: ObjectStore, initialize called 2025-06-17 20:46:00,404 INFO metastore.MetaStoreDirectSql: Using direct SQL, underlying DB is MYSQL 2025-06-17 20:46:00,406 INFO metastore.ObjectStore: Initialized ObjectStore 2025-06-17 20:46:00,446 INFO metastore.HiveMetaStore: 1: get_tables_by_type: db=@hive#db_hive1 pat=.*,type=MATERIALIZED_VIEW 2025-06-17 20:46:00,447 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_tables_by_type: db=@hive#db_hive1 pat=.*,type=MATERIALIZED_VIEW 2025-06-17 20:46:00,470 INFO metastore.HiveMetaStore: 1: get_multi_table : db=db_hive1 tbls= 2025-06-17 20:46:00,471 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_multi_table : db=db_hive1 tbls= 2025-06-17 20:46:00,486 INFO metastore.HiveMetaStore: 1: get_tables_by_type: db=@hive#default pat=.*,type=MATERIALIZED_VIEW 2025-06-17 20:46:00,486 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_tables_by_type: db=@hive#default pat=.*,type=MATERIALIZED_VIEW 2025-06-17 20:46:00,495 INFO metastore.HiveMetaStore: 1: get_multi_table : db=default tbls= 2025-06-17 20:46:00,495 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_multi_table : db=default tbls= 2025-06-17 20:46:00,495 INFO metastore.HiveMetaStore: 1: get_tables_by_type: db=@hive#itcast_ods pat=.*,type=MATERIALIZED_VIEW 2025-06-17 20:46:00,496 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_tables_by_type: db=@hive#itcast_ods pat=.*,type=MATERIALIZED_VIEW 2025-06-17 20:46:00,503 INFO metastore.HiveMetaStore: 1: get_multi_table : db=itcast_ods tbls= 2025-06-17 20:46:00,503 INFO HiveMetaStore.audit: ugi=root ip=unknown-ip-addr cmd=get_multi_table : db=itcast_ods tbls= 2025-06-17 20:46:00,503 INFO metadata.HiveMaterializedViewsRegistry: Materialized views registry has been initialized hive>
06-18
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值