小型数据采集平台

1. 将IntelliJ IDEA中写好的程序打成jar包导出,移入Linux中
执行Jar程序 ,比如:
[root@JHB0 module]$ java -classpath log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar com.root.appclient.AppMain  >/opt/module/test.log

2.查看  /opt/module/data下的新生成的日志文件 

=================================lg.sh===================================
3.配置日志生成集群启动脚本 ------------------------------ lg.sh
 (1)配置登录远程服务器立即source一下环境变量
[root@JHB0 ~]$ echo source /etc/profile >> ~/.bashrc
[root@JHB1 ~]$ echo source /etc/profile >> ~/.bashrc
[root@JHB2 ~]$ echo source /etc/profile >> ~/.bashrc

(2)在/home/root/bin目录下创建脚本lg.sh    
    [root@JHB0 bin]$ vim lg.sh
内容如下:
#! /bin/bash

    for i in JHB0 JHB1 
    do
        ssh $i "java -classpath /opt/module/log-collector-1.0-SNAPSHOT-jar-with-dependencies.jar com.Charlie.Guo.AppMain  >/opt/module/test.log &"
done           

修改脚本执行权限
[root@JHB0 bin]$ chmod 777 lg.sh

启动脚本
[root@JHB0 module]$ lg.sh

分别在JHB0、JHB1的/opt/module/data目录上查看生成的数据
[root@JHB0 data]$ ls
app-2019-02-10.log
[root@JHB1 data]$ ls
app-2019-02-10.log
========================================================================================

=============================集群时间同步修改脚本 dt.sh======================================
在/home/root/bin目录下创建脚本dt.sh
[root@JHB0 bin]$ vim dt.sh

内容如下
#!/bin/bash

log_date=$1" $2"

echo $log_date

for i in JHB0 JHB1 JHB2
do
        ssh -t $i "sudo date -s '$log_date'"
done

修改脚本执行权限
[root@JHB0 bin]$ chmod 777 dt.sh

启动脚本 比如:
[root@JHB0 bin]$ dt.sh 2021-6-5
=====================集群所有进程查看脚本  xcall.sh=======================
在/home/root/bin目录下创建脚本xcall.sh
[root@JHB0 bin]$ vim xcall.sh

内容如下:
#! /bin/bash

for i in JHB0 JHB1 JHB2
do
        echo --------- $i ----------
        ssh $i "$*"
done

修改脚本执行权限
[root@JHB0 bin]$ chmod 777 xcall.sh

启动脚本
[root@JHB0 bin]$ xcall.sh jps
=============================================================

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
==================日志采集Flume配置 file-flume-kafka.conf====================
(1)在/opt/module/flume/conf目录下创建file-flume-kafka.conf文件
[root@JHB0 conf]$ vim file-flume-kafka.conf
内容如下:
a1.sources=r1
a1.channels=c1 c2

# configure source
a1.sources.r1.type = TAILDIR
a1.sources.r1.positionFile = /opt/module/flume/test/log_position.json
a1.sources.r1.filegroups = f1
a1.sources.r1.filegroups.f1 = /tmp/logs/app.+
a1.sources.r1.fileHeader = true
a1.sources.r1.channels = c1 c2

#interceptor
a1.sources.r1.interceptors =  i1 i2
a1.sources.r1.interceptors.i1.type = com.root.flume.interceptor.ETLInterceptor$Builder
a1.sources.r1.interceptors.i2.type = com.root.flume.interceptor.TypeInterceptor$Builder

a1.sources.r1.selector.type = multiplexing
a1.sources.r1.selector.header = topic
a1.sources.r1.selector.mapping.topic_start = c1
a1.sources.r1.selector.mapping.topic_event = c2

# configure channel
a1.channels.c1.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c1.kafka.bootstrap.servers = JHB0:9092,JHB1:9092,JHB2:9092
a1.channels.c1.kafka.topic = topic_start
a1.channels.c1.parseAsFlumeEvent = false
a1.channels.c1.kafka.consumer.group.id = flume-consumer

a1.channels.c2.type = org.apache.flume.channel.kafka.KafkaChannel
a1.channels.c2.kafka.bootstrap.servers = JHB0:9092,JHB1:9092,JHB2:9092
a1.channels.c2.kafka.topic = topic_event
a1.channels.c2.parseAsFlumeEvent = false
a1.channels.c2.kafka.consumer.group.id = flume-consumer

(2)IDEA中自定义Flume拦截器,jar包导入Linux中
放入到JHB0的/opt/module/flume/lib文件夹下面
[root@JHB0 lib]$ ls | grep interceptor
flume-interceptor-1.0-SNAPSHOT.jar

分发Flume到JHB1、JHB2
[root@JHB0 module]$ xsync flume/

[root@JHB0 flume]$ bin/flume-ng agent --name a1 --conf-file conf/file-flume-kafka.conf &
==================日志采集Flume启动停止脚本 f1.sh=================
在 /bin目录下创建脚本f1.sh
[root@JHB0 bin]$ vim f1.sh
内容如下:
#! /bin/bash

case $1 in
"start"){
        for i in JHB0 JHB1
        do
                echo " --------启动 $i 采集flume-------"
                ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/file-flume-kafka.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/dev/null 2>&1 &"
        done
};;    
"stop"){
        for i in JHB0 JHB1
        do
                echo " --------停止 $i 采集flume-------"
                ssh $i "ps -ef | grep file-flume-kafka | grep -v grep |awk '{print \$2}' | xargs kill"
        done

};;
esac

增加脚本执行权限
[root@JHB0 bin]$ chmod 777 f1.sh

f1集群启动脚本
[root@JHB0 module]$ f1.sh start
================Kafka集群启动停止脚本 kf.sh======================
[root@JHB0 bin]$ vim kf.sh

#! /bin/bash
内容如下:
case $1 in
"start"){
        for i in JHB0 JHB1 JHB2
        do
                echo " --------启动 $i Kafka-------"
                # 用于KafkaManager监控
                ssh $i "export JMX_PORT=9988 && /opt/module/kafka/bin/kafka-server-start.sh -daemon /opt/module/kafka/config/server.properties "
        done
};;
"stop"){
        for i in JHB0 JHB1 JHB2
        do
                echo " --------停止 $i Kafka-------"
                ssh $i "/opt/module/kafka/bin/kafka-server-stop.sh stop"
        done
};;
esac

增加脚本执行权限
[root@JHB0 bin]$ chmod 777 kf.sh

kf集群启动脚本
[root@JHB0 module]$ kf.sh start
====================Kafka 生产、消费消息========================
(1)查看所有Kafka Topic
[root@JHB0 kafka]$ bin/kafka-topics.sh --zookeeper JHB0:2181 --list

(2)创建 Kafka Topic 
进入到/opt/module/kafka/目录下分别创建:
启动日志主题
[root@JHB0 kafka]$ bin/kafka-topics.sh --zookeeper JHB0:2181,JHB1:2181,JHB2:2181  --create --replication-factor 1 --partitions 1 --topic topic_start

事件日志主题
[root@JHB0 kafka]$ bin/kafka-topics.sh --zookeeper JHB0:2181,JHB1:2181,JHB2:2181  --create --replication-factor 1 --partitions 1 --topic topic_event

(3)删除 Kafka Topic
删除启动日志主题
[root@JHB0 kafka]$ bin/kafka-topics.sh --delete --zookeeper JHB0:2181,JHB1:2181,JHB2:2181 --topic topic_start

删除事件日志主题
[root@JHB0 kafka]$ bin/kafka-topics.sh --delete --zookeeper JHB0:2181,JHB1:2181,JHB2:2181 --topic topic_event

(4)生产消息
[root@JHB0 kafka]$ bin/kafka-console-producer.sh \
--broker-list JHB0:9092 --topic topic_start
>hello world
>root  root

(5)消费消息
[root@JHB0 kafka]$ bin/kafka-console-consumer.sh \
--zookeeper JHB0:2181 --from-beginning --topic topic_start
===================Kafka Manager====================
    安装
(1)拷贝kafka-manager-1.3.3.22.zip到JHB0的/opt/module目录
[root@JHB0 module]$ unzip kafka-manager-1.3.3.15.zip

(2)修改配置文件
进入到/opt/module/kafka-manager-1.3.3.15/conf目录
[root@JHB0 conf]$ vim application.conf
修改为:
kafka-manager.zkhosts="JHB0:2181,JHB1:2181,JHB2:2181"

(3)启动KafkaManager
[root@JHB0 kafka-manager-1.3.3.22]$ 
nohup bin/kafka-manager   -Dhttp.port=7456 >/opt/module/kafka-manager-1.3.3.15/start.log

(4)Kafka Manager启动停止脚本
在 /bin目录下创建脚本km.sh
[root@JHB0 bin]$ vim km.sh
内容如下:
#! /bin/bash

case $1 in
"start"){
        echo " -------- 启动 KafkaManager -------"
        nohup /opt/module/kafka-manager-1.3.3.15/bin/kafka-manager   -Dhttp.port=7456 >start.log 2>&1 &
};;
"stop"){
        echo " -------- 停止 KafkaManager -------"
        ps -ef | grep ProdServerStart | grep -v grep |awk '{print $2}' | xargs kill
};;
esac

增加脚本执行权限
[root@JHB0 bin]$ chmod 777 km.sh

km集群启动脚本
[root@JHB0 module]$ km.sh start
==============kafka-flume-hdfs.conf 文件====================
在JHB2的/opt/module/flume/conf目录下创建kafka-flume-hdfs.conf文件
[root@JHB2 conf]$ vim kafka-flume-hdfs.conf
配置内容如下:
## 组件
a1.sources=r1 r2
a1.channels=c1 c2
a1.sinks=k1 k2

## source1
a1.sources.r1.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r1.batchSize = 5000
a1.sources.r1.batchDurationMillis = 2000
a1.sources.r1.kafka.bootstrap.servers = JHB0:9092,JHB1:9092,JHB2:9092
a1.sources.r1.kafka.topics=topic_start

## source2
a1.sources.r2.type = org.apache.flume.source.kafka.KafkaSource
a1.sources.r2.batchSize = 5000
a1.sources.r2.batchDurationMillis = 2000
a1.sources.r2.kafka.bootstrap.servers = JHB0:9092,JHB1:9092,JHB2:9092
a1.sources.r2.kafka.topics=topic_event

## channel1
a1.channels.c1.type = file
a1.channels.c1.checkpointDir = /opt/module/flume/checkpoint/behavior1
a1.channels.c1.dataDirs = /opt/module/flume/data/behavior1/
a1.channels.c1.maxFileSize = 2146435071
a1.channels.c1.capacity = 1000000
a1.channels.c1.keep-alive = 6

## channel2
a1.channels.c2.type = file
a1.channels.c2.checkpointDir = /opt/module/flume/checkpoint/behavior2
a1.channels.c2.dataDirs = /opt/module/flume/data/behavior2/
a1.channels.c2.maxFileSize = 2146435071
a1.channels.c2.capacity = 1000000
a1.channels.c2.keep-alive = 6

## sink1
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = /origin_data/gmall/log/topic_start/%Y-%m-%d
a1.sinks.k1.hdfs.filePrefix = logstart-
a1.sinks.k1.hdfs.round = true
a1.sinks.k1.hdfs.roundValue = 10
a1.sinks.k1.hdfs.roundUnit = second

##sink2
a1.sinks.k2.type = hdfs
a1.sinks.k2.hdfs.path = /origin_data/gmall/log/topic_event/%Y-%m-%d
a1.sinks.k2.hdfs.filePrefix = logevent-
a1.sinks.k2.hdfs.round = true
a1.sinks.k2.hdfs.roundValue = 10
a1.sinks.k2.hdfs.roundUnit = second

## 不要产生大量小文件
a1.sinks.k1.hdfs.rollInterval = 10
a1.sinks.k1.hdfs.rollSize = 134217728
a1.sinks.k1.hdfs.rollCount = 0

a1.sinks.k2.hdfs.rollInterval = 10
a1.sinks.k2.hdfs.rollSize = 134217728
a1.sinks.k2.hdfs.rollCount = 0

## 控制输出文件是原生文件。
a1.sinks.k1.hdfs.fileType = CompressedStream 
a1.sinks.k2.hdfs.fileType = CompressedStream 

a1.sinks.k1.hdfs.codeC = lzop
a1.sinks.k2.hdfs.codeC = lzop

## 拼装
a1.sources.r1.channels = c1
a1.sinks.k1.channel= c1

a1.sources.r2.channels = c2
a1.sinks.k2.channel= c2

==========================Flume异常处理==============================
问题描述:如果启动消费Flume抛出如下异常
ERROR hdfs.HDFSEventSink: process failed
java.lang.OutOfMemoryError: GC overhead limit exceeded

解决方案步骤:
(1)在JHB0服务器的/opt/module/flume/conf/flume-env.sh文件中增加如下配置
export JAVA_OPTS="-Xms100m -Xmx2000m -Dcom.sun.management.jmxremote"
(2)同步配置到JHB1、JHB2服务器
[root@JHB0 conf]$ xsync flume-env.sh

=============日志消费Flume启动停止脚本 f2.sh====================
(1)在/bin目录下创建脚本f2.sh
[root@JHB0 bin]$ vim f2.sh
填写如下内容
#! /bin/bash

case $1 in
"start"){
        for i in JHB2
        do
                echo " --------启动 $i 消费flume-------"
                ssh $i "nohup /opt/module/flume/bin/flume-ng agent --conf-file /opt/module/flume/conf/kafka-flume-hdfs.conf --name a1 -Dflume.root.logger=INFO,LOGFILE >/opt/module/flume/log.txt   2>&1 &"
        done
};;
"stop"){
        for i in JHB2
        do
                echo " --------停止 $i 消费flume-------"
                ssh $i "ps -ef | grep kafka-flume-hdfs | grep -v grep |awk '{print \$2}' | xargs kill"
        done

};;
esac

(2)增加脚本执行权限
[root@JHB0 bin]$ chmod 777 f2.sh

(3)f2集群启动脚本
[root@JHB0 module]$ f2.sh start

============================进行测试=============================
1. 所有节点都打开   hadoop里边创建目录   
进入到/opt/module/hadoop2.7.2下,输入命令:
hadoop fs -mkdir -p /origin_data/gmall/log/topic_start
hadoop fs -mkdir -p /origin_data/gmall/log/topic_event

2. f2.sh start
    f1.sh start
    lg.sh     

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值