[时间的手,翻雨覆雨了什么]
上一篇讲到了flume-ng+kakfa的安装与单点测试,在此基础上,再加入storm的单机安装,以及kafkaSpout的实例。形成一个完整的实时单机处理实例。
1.安装与启动
(1)官网下载0.10.0版本,解压
wget http://mirror.bit.edu.cn/apache/storm/apache-storm-0.10.0/apache-storm-0.10.0.tar.gz
tar -xvzf apache-storm-0.10.0.tar.gz
(2)修改配置文件,添加环境变量
配置文件apache-storm-0.10.0/conf/storm.yaml,单机测试无需修改
修改bash_profile,增加如下几行
export STORM_HOME="/home/XX/apache-storm-0.10.0"
PATH=$PATH:${STORM_HOME}/bin
(3)启动集群
storm nimbus &
storm supervisor &
storm ui &
UI地址:localhost:8080/index.html
输入jps,可以看到 nimbus 和 supervisor进程,说明集群启动成功。
2.代码
topology部分
package cn.realtime;
import backtype.storm.Config;
import backtype.storm.LocalCluster;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.StormTopology;
import backtype.storm.spout.SchemeAsMultiScheme;
import backtype.storm.topology.TopologyBuilder;
import storm.kafka.*;
import java.util.Map;
/**
* Created by maixiaohai on 16/4/21.
*/
public class KafkaTopology {
public static int NUM_WORKERS = 1;
public static int NUM_ACKERS = 1;
public static int MSG_TIMEOUT = 180;
public static int SPOUT_PARALLELISM_HINT = 1;
public static int PARSE_BOLT_PARALLELISM_HINT = 1;
public StormTopology buildTopology(Map map) {
String zkServer = map.get("zookeeper").toString();
System.out.println("zkServer: " + zkServer);
final BrokerHosts zkHosts = new ZkHosts(zkServer);
SpoutConfig kafkaConfig = new SpoutConfig(zkHosts, "YOUR_KAFKA_TOPIC", "/test", "single-point-test");
kafkaConfig.scheme = new SchemeAsMultiScheme(new StringScheme());
TopologyBuilder builder = new TopologyBuilder();
builder.setSpout("kafkaSpout", new KafkaSpout(kafkaConfig), SPOUT_PARALLELISM_HINT);
builder.setBolt("parseBolt", new ParseBolt(), PARSE_BOLT_PARALLELISM_HINT).shuffleGrouping("kafkaSpout");
return builder.createTopology();
}
public static void main(String[] args) throws Exception {
System.out.println("===========start===========");
Map map = XmlHelper.Dom2Map("realtime.xml");
KafkaTopology kafkaTopology = new KafkaTopology();
StormTopology stormTopology = kafkaTopology.buildTopology(map);
Config config = new Config();
config.setNumWorkers(NUM_WORKERS);
config.setNumAckers(NUM_ACKERS);
config.setMessageTimeoutSecs(MSG_TIMEOUT);
config.setMaxSpoutPending(5000);
// LocalCluster cluster = new LocalCluster();
// cluster.submitTopology("single-point-test", config, stormTopology);
StormSubmitter.submitTopology("single-point-test", config, stormTopology);
}
}
realtime.xml配置
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<nimbus>
<host>localhost</host>
<thriftPort>6627</thriftPort>
</nimbus>
<zookeeper>localhost:2181</zookeeper>
<kafka>localhost:9092</kafka>
</configuration>
parseBolt打印传入的tuple值
package cn.realtime;
import backtype.storm.topology.BasicOutputCollector;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseBasicBolt;
import backtype.storm.tuple.Tuple;
/**
* Created by maixiaohai on 16/4/21.
*/
public class ParseBolt extends BaseBasicBolt {
public void execute(Tuple tuple, BasicOutputCollector basicOutputCollector) {
String word = tuple.getString(0);
System.out.println(word);
}
public void declareOutputFields(OutputFieldsDeclarer outputFieldsDeclarer) {
}
}
dependencies
<dependencies>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-core</artifactId>
<version>0.10.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>log4j-over-slf4j</artifactId>
</exclusion>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.storm</groupId>
<artifactId>storm-kafka</artifactId>
<version>0.10.0</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.1</version>
<exclusions>
<exclusion>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</exclusion>
</exclusions>
</dependency>
<!--<dependency>-->
<!--<groupId>com.googlecode.json-simple</groupId>-->
<!--<artifactId>json-simple</artifactId>-->
<!--<version>1.1.1</version>-->
<!--</dependency>-->
</dependencies>
打包后,执行storm jar kafkaTopology.jar cn.realtime.KafkaTopology
在UI上可以看到新提交的topology,在apache-storm-0.10.0/logs/目录下,会有相应的日志
结合
上一篇的flume监测文件路径的方式,即
agent.sources.source1.type = spooldir 时,
增加对应目录下文件并保存,可以看到storm日志中打印出对应的行。代表整个流程成功。
增加对应目录下文件并保存,可以看到storm日志中打印出对应的行。代表整个流程成功。
3.注意事项
(1)过程中的一些报错
报错1
java.lang.NoSuchMethodError: org.apache.zookeeper.ZooKeeper.<init>(Ljava/lang/String;ILorg/apache/zookeeper/Watcher;Z)V at org.apache.curator.utils.DefaultZookeeperFactory.newZooKeeper(DefaultZookeep
报错2
java.lang.NoClassDefFoundError: org/json/simple/JSONValue at storm.kafka.DynamicBrokersReader
类似报错基本都是源于引入的maven库版本不合适,对应的jar包版本不对导致的报错。
比如报错1,代表zookeeper版本不对,找到引入zookeeper的maven库,发现是kafka_2.10的的version过低,
改为当前版本后报错1就消失了。
(2)
打包时,应该去掉strom-core,否则会冲突
(3)有关kafkaSpout和storm的配置参数后面会再写文章解释