四(2)、flume遇到的问题

在使用CDH 6.0.1中的Flume 1.8版本从MySQL抽取数据到HDFS时遇到问题,包括创建状态文件的权限问题和类冲突错误。解决方案包括更改文件权限和尝试使用不同版本的Flume,如1.7版,避免返回值类型不一致导致的冲突。另外,配置文件的写法也会影响Flume的运行,正确配置可以避免报错。

使用的是6.0.1cdh上面的1.8版本的flume:
从mysql抽取数据到hdfs
flume-ng agent --conf conf --conf-file conf/flume-conf04.properties --name a1 -Dflume.root.logger=INFO,console

a1.channels.ch1.type = memory
a1.sources.sql-source.channels = ch1
a1.channels = ch1
a1.sinks = HDFS
a1.sources = sql-source

a1.sources.sql-source.type = org.keedio.flume.source.SQLSource

a1.sources.sql-source.connection.url = jdbc:mysql://ip地址:3306/库名
a1.sources.sql-source.user = 用户名
a1.sources.sql-source.password = 密码
a1.sources.sql-source.table = 表名
a1.sources.sql-source.columns.to.select = *

a1.sources.sql-source.incremental.column.name = id
a1.sources.sql-source.incremental.value = 0

a1.sources.sql-source.run.query.delay=5000

a1.sources.sql-source.status.file.path = /var/lib/flume-ng/flume
a1.sources.sql-source.status.file.name = sql-source.status

a1.sinks.HDFS.channel = ch1
a1.sinks.HDFS.type = hdfs
a1.sinks.HDFS.hdfs.path = hdfs://node01/user/hive/warehouse/test.db/dim_period_d
a1.sinks.HDFS.hdfs.fileType = DataStream
a1.sinks.HDFS.hdfs.writeFormat = Text
a1.sinks.HDFS.hdfs.rollSize = 268435456
a1.sinks.HDFS.hdfs.rollInterval = 0
a1.sinks.HDFS.hdfs.rollCount = 0

报错:

2019-03-19 10:10:46,206 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,206 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,206 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,206 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:930)] Added sinks: HDFS Agent: a1
2019-03-19 10:10:46,206 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,206 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,207 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,207 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,207 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration$AgentConfiguration.addProperty(FlumeConfiguration.java:1016)] Processing:HDFS
2019-03-19 10:10:46,224 (conf-file-poller-0) [INFO - org.apache.flume.conf.FlumeConfiguration.validateConfiguration(FlumeConfiguration.java:140)] Post-validation flume configuration contains configuration for agents: [a1]
2019-03-19 10:10:46,224 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:147)] Creating channels
2019-03-19 10:10:46,230 (conf-file-poller-0) [INFO - org.apache.flume.channel.DefaultChannelFactory.create(DefaultChannelFactory.java:42)] Creating instance of channel ch1 type memory
2019-03-19 10:10:46,234 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.loadChannels(AbstractConfigurationProvider.java:201)] Created channel ch1
2019-03-19 10:10:46,235 (conf-file-poller-0) [INFO - org.apache.flume.source.DefaultSourceFactory.create(DefaultSourceFactory.java:41)] Creating instance of source sql-source, type org.keedio.flume.source.SQLSource
2019-03-19 10:10:46,242 (conf-file-poller-0) [INFO - org.keedio.flume.source.SQLSource.configure(SQLSource.java:63)] Reading and processing configuration values for source sql-source
2019-03-19 10:10:46,249 (conf-file-poller-0) [ERROR - org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:361)] Source sql-source has been removed due to an error during configuration
org.apache.flume.conf.ConfigurationException: hibernate.connection.url property not set
	at org.keedio.flume.source.SQLSourceHelper.checkMandatoryProperties(SQLSourceHelper.java:294)
	at org.keedio.flume.source.SQLSourceHelper.<init>(SQLSourceHelper.java:100)
	at org.keedio.flume.source.SQLSource.configure(SQLSource.java:66)
	at org.apache.flume.conf.Configurables.configure(Configurables.java:41)
	at org.apache.flume.node.AbstractConfigurationProvider.loadSources(AbstractConfigurationProvider.java:326)
	at org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:101)
	at org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run(PollingPropertiesFileConfigurationProvider.java:145)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
2019-03-19 10:10:46,252 (conf-file-poller-0) [INFO - org.apache.flume.sink.DefaultSinkFactory.create(DefaultSinkFactory.java:42)] Creating instance of sink: HDFS, type: hdfs
2019-03-19 10:10:46,265 (conf-file-poller-0) [INFO - org.apache.flume.node.AbstractConfigurationProvider.getConfiguration(AbstractConfigurationProvider.java:116)] Channel ch1 connected to [HDFS]
2019-03-19 10:10:46,267 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:161)] Starting new configuration:{ sourceRunners:{} sinkRunners:{HDFS=SinkRunner: { policy:org.apache.flume.sink.DefaultSinkProcessor@4f5a783e counterGroup:{ name:null counters:{} } }} channels:{ch1=org.apache.flume.channel.MemoryChannel{name: ch1}} }
2019-03-19 10:10:46,268 (conf-file-poller-0) [INFO - org.apache.flume.node.Application.startAllComponents(Application.java:168)] Starting Channel ch1
2019-03-19 10:10:46,310 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.register(MonitoredCounterGroup.java:119)] Monitored counter group for type: CHANNEL, name: ch1: Successfully registered new MBean.
2019-03-19 10:10:46,310 (lifecycleSupervisor-1-0) [INFO - org.apache.flume.instrumentation.MonitoredCounterGroup.start(
### Flume与Powershell运行时内部错误解决方案 Flume 是一个分布式、可靠且高效的日志收集系统,通常用于将数据从多个来源传输到集中式存储系统。在某些情况下,用户可能会尝试通过 PowerShell 脚本来启动或管理 Flume 的运行环境。然而,在此过程中可能会遇到运行时的内部错误。 以下是可能的原因及解决方法: #### 1. **环境配置问题** 确保 Flume 的运行环境已经正确配置。例如,Java 运行时环境(JRE)版本需要与 Flume 兼容[^1]。如果使用的是较新的 Java 版本,而 Flume 不支持该版本,则可能导致内部错误。 ```powershell # 检查当前使用的 Java 版本 java -version ``` 如果发现版本不兼容,可以切换到支持的 Java 版本: ```powershell # 设置 JAVA_HOME 环境变量 $env:JAVA_HOME = "C:\Program Files\Java\jdk-8u281" ``` #### 2. **Flume 配置文件错误** Flume 的配置文件中可能存在语法错误或路径错误。确保 `flume-conf.properties` 文件中的所有路径和参数都正确无误[^1]。例如,检查 source、channel 和 sink 的配置是否符合要求。 ```properties # 示例配置 agent.sources = source1 agent.channels = channel1 agent.sinks = sink1 agent.sources.source1.type = exec agent.sources.source1.command = powershell.exe -Command "Get-Date" agent.sources.source1.channels = channel1 agent.channels.channel1.type = memory agent.channels.channel1.capacity = 1000 agent.sinks.sink1.type = logger agent.sinks.sink1.channel = channel1 ``` #### 3. **权限不足** 在某些情况下,Flume 可能没有足够的权限来执行特定操作,例如访问文件系统或网络资源。可以通过提升 PowerShell 的权限来解决此问题[^3]。 ```powershell # 使用管理员权限启动 PowerShell Start-Process powershell -Verb runAs ``` 同时,确保 Flume 的运行用户具有对相关目录的读写权限。 #### 4. **依赖库缺失** Flume 的运行可能依赖于某些外部库。如果这些库未正确安装或路径设置错误,也可能导致内部错误。检查 Flume 的 `lib` 目录,确保所有必要的 JAR 文件存在[^1]。 ```powershell # 列出 lib 目录下的所有文件 Get-ChildItem "C:\path\to\flume\lib" ``` #### 5. **日志分析** 查看 Flume 的日志文件以获取更多关于内部错误的信息。日志文件通常位于 Flume 安装目录下的 `logs` 文件夹中。通过分析日志,可以更准确地定位问题所在。 ```powershell # 查看最近的日志文件 Get-Content "C:\path\to\flume\logs\flume.log" | Select-Object -Last 10 ``` #### 6. **PowerShell 脚本问题** 如果 Flume 的 source 或 sink 中涉及 PowerShell 脚本,脚本本身可能存在语法错误或其他问题。可以通过单独运行脚本来验证其正确性。 ```powershell # 测试 PowerShell 命令 powershell.exe -Command "Get-Date" ``` --- ###
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值