1.获取弹幕文件中带!的弹幕
由于!有些是中英文的,所以filter需要一个||
// 获取弹幕中带感叹号的,无论大小写
var lines = sc.textFile("file:///root/Desktop/barrage.json")
var lines_after= lines.filter(line=>(line.contains("!")) || (line.contains("!")))
// 去除空格等多余字符
var result = lines_after.map(x=>x.trim.substring(1,x.trim.length-2))
// 保存到文件
result.saveAsTextFile("file:///root/Desktop/result/result1")
2.文件排序
var readline = sc.textFile("file:///disk4/bigdata")
var result = readline.filter(x => x.trim().length>0).map(x=>(x.toInt,"")).partitionBy(new HashPartitioner(1)).sortByKey().map(x =>x._1)
result.collect.foreach(println)
result.saveAsTextFile("file:///disk4/bigdata2")
3.求最大值
var readline = sc.textFile("hdfs://node1:9000/bigdata")
var result = readline.map(x=>(x.toInt,x)).sortByKey(false).take(1).map(x=>x._2.toInt)
for(x <- lines ){println(x)}
val rdd = sc.parallelize(lines)
rdd.saveAsTextFile("file:///disk/ssss")
思考一下为什么2需要合并为一个分区,而3不需要?
4.键值对中values的处理
1)mapValues
val text = sc.parallelize(Array(("a",1),("b",2)))
val result = text.mapValues(x => x+1)
2)map
val text = sc.parallelize(Array(("a",1),("b",2)))
val result = text.map(x => (x._1,x._2+1))