Trident wordCount例子解读

最新推荐文章于 2020-07-21 17:26:47 发布

Quan.S

最新推荐文章于 2020-07-21 17:26:47 发布

阅读量892

点赞数

CC 4.0 BY-SA版权

分类专栏： streaming 文章标签： storm trident

本文链接：https://blog.youkuaiyun.com/xianzhen376/article/details/50956153

streaming 专栏收录该内容

10 篇文章

订阅专栏

本文深入解析了 Apache Storm Tridents WordCount 的实现过程，包括数据源创建和拓扑构建两大步骤。首先，通过 FixedBatchSpout 创建了一个循环数据源，将四个句子不断循环形成数据流。接着，利用 TridentTopology 构建拓扑，通过遍历 batch、分组和聚合等操作，实现了单词计数的功能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

官方文档地址：http://storm.apache.org/documentation/Trident-tutorial.html

简介

这个手册，首先就给了WordCount的例子。WordCount就特么这么好理解么，接口文档上说明少的掉渣啊。吐槽完毕，例子主要分为两个步骤：1. 构建数据源；2. 构建拓扑。

1. 创造源数据

 FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3,
               new Values("the cow jumped over the moon"),
               new Values("the man went to the store and bought some candy"),
               new Values("four score and seven years ago"),
               new Values("how many apples can you eat"));
spout.setCycle(true);

效果：产生一个源数据流。处理过程中至多可以被分成3个batch，batch的name是“sentence”。不停的将上面的4个句子循环形成数据流。

2. 构建Topology

TridentTopology topology = new TridentTopology();        
TridentState wordCounts =
     topology.newStream("spout1", spout)
       .each(new Fields("sentence"), new Split(), new Fields("word"))
       .groupBy(new Fields("word"))
       .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))                
       .parallelismHint(6);

每行依次的效果：
1. 以spout为源创建流；
2. 遍历流中name为“sentence”的batch，作为Split函数的输入，输出name为“word”的tuple。
3. 对name为“world”的tuple做groupby操作；
4. 对groupby的结果，做Count操作，Count为内建聚合方法，结果存储在内存中，存为name为”count”的tuple。每个batch的计算结果再进行聚合（叠加）。
5. 并发度为6，也就是最多允许6个线程。即Executor的数量。