Key-Value类型的RDD的创建及基本转换(1)

1. 创建一个基本的key-value的RDD
scala> val kvPairRDD =
     |   sc.parallelize(Seq(("key1", "value1"), ("key2", "value2"), ("key3", "value3")))
kvPairRDD: org.apache.spark.rdd.RDD[(String, String)] = ParallelCollectionRDD[17] at parallelize at <console>:25

//使用collect获得集群元素信息
scala> kvPairRDD.collect
res21: Array[(String, String)] = Array((key1,value1), (key2,value2), (key3,value3))
2. 可以从一个类的对象中创建RDD
case class User(userId: String, amount: Int)
val personSeqRDD =
  sc.parallelize(Seq(User("jeffy", 30), User("kkk", 20), User("jeffy", 30), User("kkk", 30)))
  scala> personSeqRDD.collect
  res22: Array[User] = Array(User(jeffy,30), User(kkk,20), User(jeffy,30), User(kkk,30))


scala>     //将RDD变成二元组类型的RDD
scala>     val keyByRDD = personSeqRDD.keyBy(x => x.userId)
keyByRDD: org.apache.spark.rdd.RDD[(String, User)] = MapPartitionsRDD[18] at keyBy at <console>:28

scala> keyByRDD.collect
res23: Array[(String, User)] = Array((jeffy,User(jeffy,30)), (kkk,User(kkk,20)), (jeffy,User(jeffy,30)), (kkk,User(kkk,30)))

val keyRDD2 = personSeqRDD.map(user => (user.userId, user))
scala> keyRDD2.collect
res24: Array[(String, User)] = Array((jeffy,User(jeffy,30)), (kkk,User(kkk,20)), (jeffy,User(jeffy,30)), (kkk,User(kkk,30)))

val keyRDD3 = personSeqRDD.map(user => (user.userId, user.amount))
scala> keyRDD3.collect
res25: Array[(String, Int)] = Array((jeffy,30), (kkk,20), (jeffy,30), (kkk,30))

val groupByRDD = personSeqRDD.groupBy(user => user.userId)

scala> val groupByRDD = personSeqRDD.groupBy(user => user.userId)

groupByRDD: org.apache.spark.rdd.RDD[(String, Iterable[User])] = ShuffledRDD[22] at groupBy at <console>:28

val rdd1 = sc.parallelize(Seq("test", "hell"))

rdd1.map(str => (str, 1))
scala> val a =  rdd1.map(str => (str, 1))
a: org.apache.spark.rdd.RDD[(String, Int)] = MapPartitionsRDD[25] at map at <console>:26
scala> a.collect
res27: Array[(String, Int)] = Array((test,1), (hell,1))
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值