Storm sql 简单测试

本文介绍如何使用Kafka进行日志收集,并通过Storm进行实时日志分析,包括错误日志过滤及慢日志过滤等操作。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

准备工作:

  1、安装Kafka,启动,以及创建相应的topic

1、启动kafka
	bin/kafka-server-start.sh  config/server.properties > /dev/null 2>&1 &

2、列出所有的topics
	bin/kafka-topics.sh --list --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181


3、创建topics
	bin/kafka-topics.sh --create --topic apache-logs --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181 --replication-factor 1 --partitions 5
	bin/kafka-topics.sh --create --topic apache-error-logs --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181 --replication-factor 1 --partitions 5
	bin/kafka-topics.sh --create --topic apache-slow-logs --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181 --replication-factor 1 --partitions 5

  2、安装Python以及pip

pip install apache-log-parser

 

3、创建并修改  Fake-Apache-Log-Generator

Fake-Apache-Log-Generator is not presented to package, and also we need to modify the script.

$ git clone https://github.com/kiritbasu/Fake-Apache-Log-Generator.git
$ cd Fake-Apache-Log-Generator

Open apache-fake-log-gen.py and replace while (flag): statements to below

 elapsed_us = random.randint(1 * 1000,1000 * 1000) # 1 ms to 1 sec
        seconds=random.randint(30,300)
        increment = datetime.timedelta(seconds=seconds)
        otime += increment

        ip = faker.ipv4()
        dt = otime.strftime('%d/%b/%Y:%H:%M:%S')
        tz = datetime.datetime.now(pytz.timezone('US/Pacific')).strftime('%z')
        vrb = numpy.random.choice(verb,p=[0.6,0.1,0.1,0.2])

        uri = random.choice(resources)
        if uri.find("apps")>0:
                uri += `random.randint(1000,10000)`

        resp = numpy.random.choice(response,p=[0.9,0.04,0.02,0.04])
        byt = int(random.gauss(5000,50))
        referer = faker.uri()
        useragent = numpy.random.choice(ualist,p=[0.5,0.3,0.1,0.05,0.05] )()
        f.write('%s - - [%s %s] %s "%s %s HTTP/1.0" %s %s "%s" "%s"\n' % (ip,dt,tz,elapsed_us,vrb,uri,resp,byt,referer,useragent))

        log_lines = log_lines - 1
        flag = False if log_lines == 0 else True

 

4、准备parse-fake-log-gen-to-json-with-incrementing-id.py 脚本

import sys
import apache_log_parser
import json

auto_incr_id = 1
parser_format = '%a - - %t %D "%r" %s %b "%{Referer}i" "%{User-Agent}i"'
line_parser = apache_log_parser.make_parser(parser_format)
while True:
  # we'll use pipe
  line = sys.stdin.readline()
  if not line:
    break
  parsed_dict = line_parser(line)
  parsed_dict['id'] = auto_incr_id
  auto_incr_id += 1

  # works only python 2, but I don't care cause it's just a test module :)
  parsed_dict = {k.upper(): v for k, v in parsed_dict.iteritems() if not k.endswith('datetimeobj')}
  print json.dumps(parsed_dict)

 

7、将产生的apache log 解析为Json写到kafka
  

python apache-fake-log-gen.py -n 0 | python parse-fake-log-gen-to-json-with-incrementing-id.py | ../kafka/bin/kafka-console-producer.sh --broker-list 192.168.46.160:9092 --topic apache-logs

 

8、查看发送到kafka
  

bin/kafka-console-consumer.sh --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181 --topic apache-logs

 

9、storm-sql-kafka
  启动storm集群 nimbus,ui,supervisor

${storm_home}/bin/storm sql apache_log_error_filtering.sql apache_log_error_filtering --artifacts "org.apache.storm:storm-sql-kafka:1.1.1,org.apache.storm:storm-kafka:1.1.1,org.apache.kafka:kafka_2.10:0.8.2.2^org.slf4j:slf4j-log4j12,org.apache.kafka:kafka-clients:0.8.2.2"

${storm_home}/bin/storm sql apache_log_slow_filtering.sql apache_log_slow_filtering --artifacts "org.apache.storm:storm-sql-kafka:1.1.1,org.apache.storm:storm-kafka:1.1.1,org.apache.kafka:kafka_2.10:0.8.2.2^org.slf4j:slf4j-log4j12,org.apache.kafka:kafka-clients:0.8.2.2" --jars "UDFTest-0.0.1-SNAPSHOT.jar"

 

10、查看kafka 返回的结果
  bin/kafka-console-consumer.sh --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181 --topic apache-error-logs

  bin/kafka-console-consumer.sh --zookeeper zk-datanode-01:2181,zk-datanode-02:2181,zk-datanode-03:2181 --topic apache-slow-logs

问题报错记录
ImportError: No module named pytz
ImportError: No module named numpy
ImportError: No module named faker

解决方式:pip install pytz
pip install numpy
pip install faker

 

可以测试了,storm 版本1.1.1,官网地址:http://storm.apache.org/releases/1.1.1/storm-sql-example.html

 

 

 

 

 

转载于:https://www.cnblogs.com/atomicbomb/p/8145371.html

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值