1. persist() & cache()
def persist(): this.type = persist(StorageLevel.MEMORY_ONLY)
def cache(): this.type = persist()
可以看出persist的存储级别是MEMORY_NOLY
cache 与 persist 完全一样
2. countByKey & countByValue
def countByKey(): Map[K, Long] = self.withScope {
self.mapValues(_ => 1L).reduceByKey(_ + _).collect().toMap
}
countByKey调用了reduceByKey,并且collect后转换成了map。所以如果Key的量比较大,谨慎调用该函数,否则会OOM,可以直接使用reduceByKey实现而不collect。
def countByValue()(implicit ord: Ordering[T] = null): Map[T, Long] = withScope {
map(value => (value, null)).countByKey()
}
countByValue直接调用了countByKey
3. isEmpty()
def isEmpty(): Boolean = withScope {
partitions.length == 0 || take(1).length == 0
}
isEmpty 操作是先调用partitions的长度,如果为0直接判断为true,如果不为0;再去一个元素看是否为空。