scala写wordcount入门 案例

本文介绍了两种使用Scala实现WordCount的方法。第一种通过将单词和计数转换为字符串列表,然后扁平化并分组计数。第二种方法直接从元组列表开始,通过分组和映射操作得到单词及其出现次数。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

复杂WordCount案例

1)方式一

object TestWordCount {

    def main(args: Array[String]): Unit = {

        // 第一种方式(不通用)
        val tupleList = List(("Hello Scala Spark World ", 4), ("Hello Scala Spark", 3), ("Hello Scala", 2), ("Hello", 1))

        val stringList: List[String] = tupleList.map(t=>(t._1 + " ") * t._2)

        //val words: List[String] = stringList.flatMap(s=>s.split(" "))
        val words: List[String] = stringList.flatMap(_.split(" "))

        //map中,如果传进来什么就返回什么,不要用_省略
        val groupMap: Map[String, List[String]] = words.groupBy(word=>word)
        //val groupMap: Map[String, List[String]] = words.groupBy(_)

        // (word, list) => (word, count)
        val wordToCount: Map[String, Int] = groupMap.map(t=>(t._1, t._2.size))

        val wordCountList: List[(String, Int)] = wordToCount.toList.sortWith {
            (left, right) => {
                left._2 > right._2
            }
        }.take(3)

        //tupleList.map(t=>(t._1 + " ") * t._2).flatMap(_.split(" ")).groupBy(word=>word).map(t=>(t._1, t._2.size))
        println(wordCountList)
    }
}

2)方式二

object TestWordCount {

    def main(args: Array[String]): Unit = {

        val tuples = List(("Hello Scala Spark World", 4), ("Hello Scala Spark", 3), ("Hello Scala", 2), ("Hello", 1))

        // (Hello,4),(Scala,4),(Spark,4),(World,4)
        // (Hello,3),(Scala,3),(Spark,3)
        // (Hello,2),(Scala,2)
        // (Hello,1)
        val wordToCountList: List[(String, Int)] = tuples.flatMap {
            t => {
                val strings: Array[String] = t._1.split(" ")
                strings.map(word => (word, t._2))
            }
        }

        // Hello, List((Hello,4), (Hello,3), (Hello,2), (Hello,1))
        // Scala, List((Scala,4), (Scala,3), (Scala,2)
        // Spark, List((Spark,4), (Spark,3)
        // Word, List((Word,4))
        val wordToTupleMap: Map[String, List[(String, Int)]] = wordToCountList.groupBy(t=>t._1)

        val stringToInts: Map[String, List[Int]] = wordToTupleMap.mapValues {
            datas => datas.map(t => t._2)
        }
        stringToInts

        /*
        val wordToCountMap: Map[String, List[Int]] = wordToTupleMap.map {
            t => {
                (t._1, t._2.map(t1 => t1._2))
            }
        }

        val wordToTotalCountMap: Map[String, Int] = wordToCountMap.map(t=>(t._1, t._2.sum))
        println(wordToTotalCountMap)
        */
    }
}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值