Trident WordCount代码示例

本文介绍了一个使用Apache Storm Trident实现的WordCount示例,通过构建Spout和Topology,完成单词计数并利用DRPC进行查询。该示例展示了如何设置Spout循环发送预定义的句子,如何将输入数据切分为单词并进行计数,以及如何创建DRPC Stream来查询单词的累积计数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Trident WordCount代码示例

完整代码

package com.test;

import backtype.storm.Config;
import backtype.storm.LocalDRPC;
import backtype.storm.StormSubmitter;
import backtype.storm.generated.AlreadyAliveException;
import backtype.storm.generated.DRPCExecutionException;
import backtype.storm.generated.InvalidTopologyException;
import backtype.storm.generated.StormTopology;
import backtype.storm.tuple.Fields;
import backtype.storm.tuple.Values;
import backtype.storm.utils.DRPCClient;
import org.apache.thrift7.TException;
import storm.trident.TridentState;
import storm.trident.TridentTopology;
import storm.trident.operation.builtin.Count;
import storm.trident.operation.builtin.FilterNull;
import storm.trident.operation.builtin.MapGet;
import storm.trident.operation.builtin.Sum;
import storm.trident.testing.FixedBatchSpout;
import storm.trident.testing.MemoryMapState;
import storm.trident.testing.Split;

public class WordCount {
    private static StormTopology buildTopology(LocalDRPC drpc) {
        /* 创建spout */
        FixedBatchSpout spout = new FixedBatchSpout(new Fields("sentence"), 3,
                new Values("the cow jumped over the moon"),
                new Values("the man went to the store and bought some candy"),
                new Values("four score and seven years ago"),
                new Values("how many apples can you eat"));
        spout.setCycle(true);

        /* 创建topology */
        TridentTopology topology = new TridentTopology();

        /* 创建Stream spout1, 分词、统计 */
        TridentState wordCounts =
                topology.newStream("spout1", spout)
                        .each(new Fields("sentence"), new Split(), new Fields("word"))
                        .groupBy(new Fields("word"))
                        .persistentAggregate(new MemoryMapState.Factory(), new Count(), new Fields("count"))
                        .parallelismHint(6);

        /* 创建Stream words,方法名为words,对入参分次,分别获取words 对应count,然后计算和 */
        topology.newDRPCStream("words", drpc)
                .each(new Fields("args"), new Split(), new Fields("word"))
                .groupBy(new Fields("word"))
                .stateQuery(wordCounts, new Fields("word"), new MapGet(), new Fields("count"))
                .each(new Fields("count"), new FilterNull())
                .aggregate(new Fields("count"), new Sum(), new Fields("sum"));

        return topology.build();
    }

    public static void main(String[] args) {
        Config conf = new Config();
        conf.setMaxSpoutPending(20);

        try {
            StormSubmitter.submitTopology("WordCount", conf, buildTopology(null));
            DRPCClient client = new DRPCClient("wonderwoman", 1234);
            for (int i = 0; i < 100; i++) {
                try {
                    System.out.println("DRPC Result: " +  client.execute("words", "cat the dog jumped"));
                    Thread.sleep(1000);
                } catch (InterruptedException e) {
                    System.out.println(e.getMessage());
                }
            }
        } catch (AlreadyAliveException e) {
            e.printStackTrace();
        } catch (InvalidTopologyException e) {
            e.printStackTrace();
        } catch (TException e) {
            e.printStackTrace();
        } catch (DRPCExecutionException e) {
            e.printStackTrace();
        }
    }
}

POM文件

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>wordCount</groupId>
    <artifactId>wordCount</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <dependencies>
        <dependency>
            <groupId>storm</groupId>
            <artifactId>storm</artifactId>
            <version>0.8.1</version>
            <scope>provided</scope>
        </dependency>
    </dependencies>

</project>

编译打包

mvn clean install
mvn package

运行&查看

./bin/storm jar wordCount-1.0-SNAPSHOT.jar WordCount
./bin/storm list

代码过程解读

  1. 创建spout,循环特定句子产生spout;
  2. 创建topology;
  3. 创建Stream spout1,以spout为流输入,进行分次、统计,结果以Map形式存储于内存;
    记录Trident状态。
  4. 创建Stream words,以DRPC的words方法为流输入,对入参进行分次。依据Trident状态,查询输入的每个单词的Count,然后计算和。
  5. main方法,调用DRPC的words方法,计算结果。

后记

小白网上看了半天,终于能搞起来了。网上东西太乱,太杂,这是个忧伤的悖论。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值