WordCount实例

这篇博客通过两种不同的方法展示了如何在Scala中实现WordCount功能,旨在找到文本中出现次数最多的三个单词。法1涉及将文本拆分为普通列表,分组计数并排序;法2和法3则利用元组和列表进行分组统计,优化了处理过程。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

1、取单词出现次数最多的前三个单词(普通列表)
object Main extends App {
    val list1 = List("hello scala","hello cindy","hello alice","scala cindy","scala alice","hello")
    val list2 = list1.flatMap(_.split(" "))
    println(list2)
    val list3 = list2.groupBy(i => i)
    println(list3)
    val list4 = list3.map(i => (i._1,i._2.length))
    println(list4)
    val list5 = list4.toList.sortBy(_._2)
    println(list5)
    println(list5.takeRight(3))
}
List(hello, scala, hello, cindy, hello, alice, scala, cindy, scala, alice, hello)
Map(alice -> List(alice, alice), scala -> List(scala, scala, scala), cindy -> List(cindy, cindy), hello -> List(hello, hello, hello, hello))
Map(alice -> 2, scala -> 3, cindy -> 2, hello -> 4)
List((alice,2), (cindy,2), (scala,3), (hello,4))
List((cindy,2), (scala,3), (hello,4))
2、取单词出现次数最多的前三个单词(元组类列表)

法1:先拆分成普通列表,再分组统计单词个数,然后排序

object Main extends App {
    val list1 = List(("hello scala",1),("hello cindy",2),("hello alice",1),("scala cindy",2),("scala alice",3),("hello",1))
    println(list1)
    val list2 = list1.map(kv => (kv._1.trim+" ")*kv._2)
    println(list2)
    val list3 = list2.flatten(i => i.split(" "))
    println(list3)
    val list4 = list3.groupBy(i => i)
    println(list4)
    val list5 = list4.toList.map(kv => (kv._1,kv._2.length))
    println(list5)
    val list6 = list5.sortBy(-_._2)
    println(list6)
}
List((hello scala,1), (hello cindy,2), (hello alice,1), (scala cindy,2), (scala alice,3), (hello,1))
List(hello scala , hello cindy hello cindy , hello alice , scala cindy scala cindy , scala alice scala alice scala alice , hello )
List(hello, scala, hello, cindy, hello, cindy, hello, alice, scala, cindy, scala, cindy, scala, alice, scala, alice, scala, alice, hello)
Map(alice -> List(alice, alice, alice, alice), scala -> List(scala, scala, scala, scala, scala, scala), cindy -> List(cindy, cindy, cindy, cindy), hello -> List(hello, hello, hello, hello, hello))
List((alice,4), (scala,6), (cindy,4), (hello,5))
List((scala,6), (hello,5), (alice,4), (cindy,4))

法2:先扁平化映射为单个单词的元组组成的List,然后分组统计单词个数,排序输出

object Main extends App {
    val list1 = List(("hello scala",1),("hello cindy",2),("hello alice",1),("scala cindy",2),("scala alice",3),("hello",1))
    println(list1)
    //将list1的每个元组进行处理,然后扁平化输出
    val list2:List[(String,Int)] = list1.flatMap(
        tuple => {
            //先将list1中每个元组的第一个元素用空格拆分,形成数组
            val strings = tuple._1.split(" ")
            println(strings.toList)
            //然后将数组的每个元素与list1的每个元组的值一一对应,映射为Map
            strings.map(i => (i,tuple._2))
        }
    )
    println(list2)
    val list3 = list2.map(kv => (kv._1+" ")*kv._2)
    println(list3)
    val list4 = list3.flatten(i => i.split(" "))
    println(list4)
    val list5 = list4.groupBy(i => i)
    println(list5)
    val list6 = list5.map(kv => (kv._1,kv._2.length))
    println(list6)
    val list7 = list6.toList.sortWith((a,b) => a._2>b._2)
    println(list7)
}
List((hello scala,1), (hello cindy,2), (hello alice,1), (scala cindy,2), (scala alice,3), (hello,1))
List(hello, scala)
List(hello, cindy)
List(hello, alice)
List(scala, cindy)
List(scala, alice)
List(hello)
List((hello,1), (scala,1), (hello,2), (cindy,2), (hello,1), (alice,1), (scala,2), (cindy,2), (scala,3), (alice,3), (hello,1))
List(hello , scala , hello hello , cindy cindy , hello , alice , scala scala , cindy cindy , scala scala scala , alice alice alice , hello )
List(hello, scala, hello, hello, cindy, cindy, hello, alice, scala, scala, cindy, cindy, scala, scala, scala, alice, alice, alice, hello)
Map(alice -> List(alice, alice, alice, alice), scala -> List(scala, scala, scala, scala, scala, scala), cindy -> List(cindy, cindy, cindy, cindy), hello -> List(hello, hello, hello, hello, hello))
Map(alice -> 4, scala -> 6, cindy -> 4, hello -> 5)
List((scala,6), (hello,5), (alice,4), (cindy,4))

法3:先扁平化映射为单个单词的元组组成的List,然后按单词分组,再分类汇总每个单词的个数,最后排序输出

object Main extends App {
    val list1 = List(("hello scala",1),("hello cindy",2),("hello alice",1),("scala cindy",2),("scala alice",3),("hello",1))
    println(list1)
    //将list1的每个元组进行处理,然后扁平化输出
    val list2:List[(String,Int)] = list1.flatMap(
        tuple => {
            //先将list1中每个元组的第一个元素用空格拆分,形成数组
            val strings = tuple._1.split(" ")
            println(strings.toList)
            //然后将数组的每个元素与list1的每个元组的值一一对应,映射为Map
            strings.map(i => (i,tuple._2))
        }
    )
    println(list2)
    //将list2按照单词分组
    val list3 = list2.groupBy(kv => kv._1)
    println(list3)
    //取分组后的Map中每个元组的第一个元素为key,对第二个元素进行处理,将第二个元素中每个键值对的的value值加起来,即分类汇总
    val list4 = list3.mapValues(list => list.map(_._2).sum)
    println(list4.toList.sortWith((a,b) => a._2 > b._2))
}
List((hello scala,1), (hello cindy,2), (hello alice,1), (scala cindy,2), (scala alice,3), (hello,1))
List(hello, scala)
List(hello, cindy)
List(hello, alice)
List(scala, cindy)
List(scala, alice)
List(hello)
List((hello,1), (scala,1), (hello,2), (cindy,2), (hello,1), (alice,1), (scala,2), (cindy,2), (scala,3), (alice,3), (hello,1))
Map(alice -> List((alice,1), (alice,3)), scala -> List((scala,1), (scala,2), (scala,3)), cindy -> List((cindy,2), (cindy,2)), hello -> List((hello,1), (hello,2), (hello,1), (hello,1)))
List((scala,6), (hello,5), (alice,4), (cindy,4))
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值