Scala集合

最新推荐文章于 2024-06-15 01:14:41 发布

原创最新推荐文章于 2024-06-15 01:14:41 发布 · 175 阅读

0 ·

CC 4.0 BY-SA版权

Scala 的集合有三大类：序列 Seq、集 Set、映射 Map，所有的集合都扩展自 Iterable 特质

在 Scala 中集合有可变（mutable）和不可变（immutable）两种类型，immutable 类型的集合

初始化后就不能改变了（注意与 val 修饰的变量进行区别）.

定长数组和变长数组

import scala.collection.mutable.ArrayBuffer

object ArrayTest {

def main(args: Array[String]) {

//初始化一个长度为 8 的定长数组，其所有元素均为 0

val arr1 = new Array[Int](8)

//直接打印定长数组，内容为数组的 hashcode 值

println(arr1)

//将数组转换成数组缓冲，就可以看到原数组中的内容了

//toBuffer 会将数组转换长数组缓冲

println(arr1.toBuffer)

//注意：如果 new，相当于调用了数组的 apply 方法，直接为数组赋值

//初始化一个长度为 1 的定长数组

val arr2 = Array[Int](10)

println(arr2.toBuffer)

//定义一个长度为 3 的定长数组

val arr3 = Array("hadoop", "storm", "spark")

//使用()来访问元素

println(arr3(2))

//////////////////////////////////////////////////

//变长数组（数组缓冲）

// 如果想使用数组缓冲，需要导入 import

scala.collection.mutable.ArrayBuffer 包

val ab = ArrayBuffer[Int]()

//向数组缓冲的尾部追加一个元素

//+=尾部追加元素

ab += 1

//追加多个元素

ab += (2, 3, 4, 5)

//追加一个数组++=

ab ++= Array(6, 7)

//追加一个数组缓冲

ab ++= ArrayBuffer(8,9)

//打印数组缓冲 ab

//在数组某个位置插入元素用 insert

ab.insert(0, -1, 0)

//删除数组某个位置的元素用 remove

ab.remove(8, 2)

println(ab)

}

Seq 序列

不可变的序列 import scala.collection.immutable._

在 Scala 中列表要么为空（Nil 表示空列表）要么是一个 head 元素加上一个 tail 列表。第一个是头（head）其他都是尾（tail）

9 :: List(5, 2) :: 操作符是将给定的头和尾创建一个新的列表

object ImmutListTest {

def main(args: Array[String]) {

//创建一个不可变的集合

val lst1 = List(1,2,3)

//将 0 插入到 lst1 的前面生成一个新的 List

val lst2 = 0 :: lst1

val lst3 = lst1.::(0)

val lst4 = 0 +: lst1

val lst5 = lst1.+:(0)

//将一个元素添加到 lst1 的后面产生一个新的集合

val lst6 = lst1 :+ 3

val lst0 = List(4,5,6)

//将 2 个 list 合并成一个新的 List

val lst7 = lst1 ++ lst0

//将 lst0 插入到 lst1 前面生成一个新的集合

val lst8 = lst1 ++: lst0 //不可用？？

//将 lst0 插入到 lst1 前面生成一个新的集合

val lst9 = lst1.:::(lst0)

println(lst9)

}

注意：:: 操作符是右结合的，如 9 :: 5 :: 2 :: Nil 相当于 9 :: (5 :: (2 :: Nil))

可变的序列 import scala.collection.mutable._

import scala.collection.mutable.ListBuffer

object MutListTest extends App{

//构建一个可变列表，初始有 3 个元素 1,2,3

val lst0 = ListBuffer[Int](1,2,3)

//创建一个空的可变列表

val lst1 = new ListBuffer[Int]

//向 lst1 中追加元素，注意：没有生成新的集合

lst1 += 4

lst1.append(5)

//将 lst1 中的元素追加到 lst0 中，注意：没有生成新的集合

lst0 ++= lst1

//将 lst0 和 lst1 合并成一个新的 ListBuffer 注意：生成了一个集合

val lst2= lst0 ++ lst1

//将元素追加到 lst0 的后面生成一个新的集合

val lst3 = lst0 :+ 5

}

Set 集

和java中的set相同，无序的，不重复的

不可变的 Set

import scala.collection.immutable.HashSet

object ImmutSetTest extends App{

val set1 = new HashSet[Int]()

//将元素和 set1 合并生成一个新的 set，原有 set 不变

val set2 = set1 + 4

//set 中元素不能重复

val set3 = set1 ++ Set(5, 6, 7)

val set0 = Set(1,3,4) ++ set1

println(set0.getClass)

}

可变的 Set

import scala.collection.mutable

object MutSetTest extends App{

//创建一个可变的 HashSet

val set1 = new mutable.HashSet[Int]()

//向 HashSet 中添加元素

set1 += 2

//add 等价于+=

set1.add(4)

set1 ++= Set(1,3,5) //++操作是拼接连个set集合

println(set1)

//删除一个元素

set1 -= 5

set1.remove(2)

println(set1)

}

Map 映射

import scala.collection.mutable

object MutMapTest extends App{

val map1 = new mutable.HashMap[String, Int]() //可变Map

//向 map 中添加数据

map1("spark") = 1

map1 += (("hadoop", 2))

map1.put("storm", 3)

println(map1)

// 取值 get getOrElse()

//从 map 中移除元素

map1 -= "spark"

map1.remove("hadoop")

println(map1)

println(map1.get("spark").get)

}

val map = Map[String,Int]("a" -> 1)//immutable.Map不可变的map

map.getOrElse

scala> val map = Map("spark" -> 1)
map: scala.collection.immutable.Map[String,Int] = Map(spark -> 1)

scala> map.getOrElse("spark",0)
res77: Int = 1

scala> map.getOrElse("spark1",0)

5.5 元组

Scala 元组将固定数量的项目组合在一起，以便它们可以作为一个整体传递。与数组或列表

不同，元组可以容纳不同类型的对象，但它们也是不可变的。

// 定义元组

var t = (1, "hello", true)

// 或者

val tuple3 = new Tuple3(1, "hello", true)

// 访问 tuple 中的元素

println(t._2) // 访问元组总的第二个元素

// 迭代元组

t.productIterator.foreach(println)

// 对偶元组

val tuple2 = (1, 3)

// 交换元组的元素位置, tuple2 没有变化, 生成了新的元组

val swap = tuple2.swap

元组是类型 Tuple1，Tuple2，Tuple3 等等。目前在 Scala 中只能有 22 个上限，如果您需要更

多个元素，那么可以使用集合而不是元组。

5.6 集合常用的方法

map, flatten, flatMap, filter, sorted, sortBy, sortWith, grouped,

fold(折叠), foldLeft, foldRight, reduce, reduceLeft, aggregate, union,

intersect(交集), diff(差集), head, tail, zip, mkString, foreach, length, slice, sum

filter和count的用法：

count：过滤出大于几的数有几个

filter：过滤符合条件表达式的元素集合

sorted：排序（默认升序）

sortBy: llist.sortBy(x => -x) 降序

val wds = List(("a", 1), ("c", 3))

wds.sortBy(x => x._2)

sortWith: 传递进去一种排序规则

scala> wds.sortWith
def sortWith(lt: ((String, Int), (String, Int)) => Boolean): List[(String, Int)]

wds.sortWith((x,y) => x._2 > y._2)

scala> wds.sortWith((x,y)=>x._2>y._2)
res44: List[(String, Int)] = List((c,3), (a,1))

grouped:分组

scala> val list = List(3,5,1)
list: List[Int] = List(3, 5, 1)

scala> list.grouped(1)
res46: Iterator[List[Int]] = non-empty iterator

scala> list.grouped(1).toList
res47: List[List[Int]] = List(List(3), List(5), List(1))

fold：

fold(初始值）（叠加的函数）

scala> list
res57: List[Int] = List(3, 5, 1)

scala> list.fold
def fold[A1 >: Int](z: A1)(op: (A1, A1) => A1): A1

scala> list.fold(0)((x,y)=>x+y)
res52: Int = 9

scala> list.fold(0)(_ + _)
res53: Int = 9

scala> list.fold(1)(_ + _)
res54: Int = 10

scala> list.fold(0)(_ - _)
res56: Int = -9

list.foldLeft 从左边开始叠

((0-3) - 5)) - 1

list.foldRight 从右边开始叠

3 - (5- (1-0))

reduce ：聚合

scala> list.reduce((x, y) => x + y)
res58: Int = 9

aggregate：聚合（局部聚合，内部调用的是foldLeft）能够模拟并行化集合，原理是把一个集合打散

scala> list.aggregate
def aggregate[B](z: => B)(seqop: (B, Int) => B,combop: (B, B) => B): B

scala> list.aggregate(0)(_ + _, _ + _)
res59: Int = 9

union: 两个集合的并集

scala> val list2 = List(0,8,6)
list2: List[Int] = List(0, 8, 6)

scala> list.union

override def union[B >: Int, That](that: scala.collection.GenSeq[B])(implicit bf: scala.collection.generic.CanBuildFrom[List[Int],B,That]): That

scala> list.union(list2)
res60: List[Int] = List(3, 5, 1, 0, 8, 6)

intersect:交集

diff：差集（相对谁来说的不同）

head：头

tail：尾

zip：拉链

scala> list,两个集合角标相同的组成一对儿
res61: List[Int] = List(3, 5, 1)

scala> list2
res62: List[Int] = List(0, 8, 6)

scala> list.zip(list2)
res63: List[(Int, Int)] = List((3,0), (5,8), (1,6))

将一个元组中的两个数据加到一起

scala> r.map(x => (x._1 + x._2))
res65: List[Int] = List(3, 13, 7)

mkString

scala> list2
res67: List[Int] = List(0, 8, 6)

scala> list2.toString()
res68: String = List(0, 8, 6)

scala> list2.mkString("|")
res69: String = 0|8|6

length:

scala> list2.length
res70: Int = 3

scala> list2.size
res71: Int = 3

slice: 截取

scala> list2.slice
override def slice(from: Int,until: Int): List[Int]

scala> list2.slice(1,3)
res72: List[Int] = List(8, 6)

scala> list2.slice(1, list2.length).map(_*10)
res73: List[Int] = List(80, 60)

sum：求和

scala> val tps = Array(("a", 3),("b", 2),("a", 2))
tps: Array[(String, Int)] = Array((a,3), (b,2), (a,2))

scala> tps.foldLeft
override def foldLeft[B](z: B)(op: (B, (String, Int)) => B): B

scala> tps.foldLeft(0)(_ + _._2)
res79: Int = 7

scala> var arr = List("hello scala","hello spark")
arr: List[String] = List(hello scala, hello spark)

scala> arr.flatMap(_.split(" "))
res80: List[String] = List(hello, scala, hello, spark)

scala> arr.flatMap(_.split(" ")).map(x => (x, 1))
res82: List[(String, Int)] = List((hello,1), (scala,1), (hello,1), (spark,1))

scala> arr.flatMap(_.split(" ")).map(x => (x, 1)).groupBy(x => x._1)
res84: scala.collection.immutable.Map[String,List[(String, Int)]] = Map(spark -> List((spark,1)), scala -> List((scala,1)), hello -> List((hello,1), (hello,1)))

scala> arr.flatMap(_.split(" ")).map(x => (x, 1)).groupBy(x => x._1).mapValues(t => t.foldLeft(0)(_ + _._2))
res85: scala.collection.immutable.Map[String,Int] = Map(spark -> 1, scala -> 1, hello -> 2)

5.7 并行化集合 par

Returns a parallel implementation of this collection.

For most collection types, this method creates a new parallel collection by copying all the elements. For these collection, par takes linear time. Mutable collections in this category do not produce a mutable parallel collection that has the same underlying dataset, so changes in one collection will not be reflected in the other one.

Specific collections (e.g. ParArray or mutable.ParHashMap) override this default behaviour by creating a parallel collection which shares the same underlying dataset. For these collections, par takes constant or sublinear time.

All parallel collections return a reference to themselves

返回此集合的并行实现。

对于大多数集合类型，此方法通过复制所有元素创建一个新的并行集合。对于这些集合，par需要线性时间。此类别中的可变集合不会生成具有相同底层数据集的可变并行集合，因此一个集合中的更改不会反映在另一个集合中。

特定的集合(例如ParArray或mutable.ParHashMap)通过创建共享相同底层数据集的并行集合来覆盖此默认行为。对于这些集合，par需要常数或次线性时间。

所有并行集合都返回对它们自己的引用。

//创建一个 List

val lst0 = List(1,7,9,8,0,3,5,4,6,2)

//折叠：有初始值（无特定顺序）

val lst11 = lst0.par.fold(100)((x, y) => x + y)

Folds the elements of this list using the specified associative binary operator.
The order in which operations are performed on elements is unspecified and may be nondeterministic.
A1 a type parameter for the binary operator, a supertype of A.
z a neutral element for the fold operation; may be added to the result an arbitrary number of times, and must not change the result (e.g., Nil for list concatenation, 0 for addition, or 1 for multiplication.)
op a binary operator that must be associative
returns
the result of applying fold operator op between all the elements and z

使用指定的关联二进制运算符折叠此列表中的元素。

对元素执行操作的顺序是未指定的，并且可能是不确定的。

一个用于二进制运算符的类型参数，a的超类型。

z为折叠操作的中性元素;可以向结果添加任意次数，并且不能更改结果(例如，列表连接为Nil，加法为0，乘法为1)。

一个二进制操作符，必须是关联的

在所有元素和z之间应用折叠算子op的结果

//折叠：有初始值（有特定顺序）

val lst12 = lst0.foldLeft(100)((x, y) => x + y)

//聚合

val arr = List(List(1, 2, 3), List(3, 4, 5), List(2), List(0))