这里先说先flatMap和map的qubie
sc.textFile(“/Users/huxicong/Downloads/dd”).flatMap(_.split(” “)).collect
flatMap会将每一行的数据全部拆开放到一个集合
Array(10.75.194.100, -, -,
[06/Jun/2016:14:35:32, +0800], “GET, /video.flv, HTTP/1.1”, 206, 2393100,
“http://www.tenglongdesign.com/“,
“Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,
+like+Gecko)+Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”
,”http://o7o80z735.bkt.clouddn.com“,
V1, 10.75.194.100, -, -, [06/Jun/2016:14:35:27,
+0800], “GET, /video.flv, HTTP/1.1”, 403, 680,
“http://www.tenglongdesign.com/“,
“Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+
Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”, “http://o7o80z735.bkt.clouddn.com“,
V1, 10.75.194.100, -, -, [06/Jun/2016:14:35:32, +0800],
“GET, /video.flv, HTTP/1.1”,
403, 680, “http://www.tenglongdesign.com/“,
“Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit…
)
sc.textFile(“/Users/huxicong/Downloads/dd”).map(_.split(” “)).collect
相比map是将一个行作为一个数组
Array(
Array(
10.75.194.100, -, -, [06/Jun/2016:14:35:32, +0800], “GET, /video.flv, HTTP/1.1”, 206, 2393100, “http://www.tenglongdesign.com/“, “Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”, “http://o7o80z735.bkt.clouddn.com“, V1), Array(10.75.194.100, -, -, [06/Jun/2016:14:35:27, +0800], “GET, /video.flv, HTTP/1.1”, 403, 680, “http://www.tenglongdesign.com/“, “Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”, “http://o7o80z735.bkt.clouddn.com“, V1
),
Array(
10.75.194.100, -, -, [06/Jun/2016:14:35:32, +0800], “GET, /video.flv, HTTP/1.1”, 403, 680, “http://www.tenglongdesign.com/“, “Mozilla/5.0+(Windows…
)
)
spark分析例子
sc.textFile(“/Users/huxicong/Downloads/datas”).map(_.split(” “)).map(x=>(x(6),1)).reduceByKey((x,y)=>(x+y)).map(x=>(x._2,x._1)).sortByKey(false).take(5)
函数分析
函数 | 分析 |
---|---|
map(x=>(x(6),1)) | 这个数据是将每个数转成(a,1) |
reduceByKey((x,y)=>(x+y)) | 通过可以相同的数据进行相加 |
map(x=>(x,_ 2,x,_ 1)) | 将一个数组key value对调位置 |
sortByKey(false) | 通过key进行排序 |
take(5) | 是获取前5条数据 |