1. 注意repartion和colese的区别,shuffle上的区别:
https://blog.youkuaiyun.com/xianpanjia4616/article/details/82053196
2.注意repartion和partionBy的区别,后者会以Key来聚合:
https://blog.youkuaiyun.com/xianpanjia4616/article/details/84328928
3.可以用combineByKey代替groupByKey的功能,前者每个分区提前聚合,效率更高。
https://www.jianshu.com/p/b77a6294f31c