spark阶梯学习2

这里先说先flatMap和map的qubie

 
sc.textFile(“/Users/huxicong/Downloads/dd”).flatMap(_.split(” “)).collect
flatMap会将每一行的数据全部拆开放到一个集合


Array(10.75.194.100, -, -,
[06/Jun/2016:14:35:32, +0800], “GET, /video.flv, HTTP/1.1”, 206, 2393100,
http://www.tenglongdesign.com/“,
“Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,
+like+Gecko)+Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”
,”http://o7o80z735.bkt.clouddn.com“,
V1, 10.75.194.100, -, -, [06/Jun/2016:14:35:27,
+0800], “GET, /video.flv, HTTP/1.1”, 403, 680,
http://www.tenglongdesign.com/“,
“Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+
Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”, “http://o7o80z735.bkt.clouddn.com“,
V1, 10.75.194.100, -, -, [06/Jun/2016:14:35:32, +0800],
“GET, /video.flv, HTTP/1.1”,
403, 680, “http://www.tenglongdesign.com/“,
“Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit…


sc.textFile(“/Users/huxicong/Downloads/dd”).map(_.split(” “)).collect
相比map是将一个行作为一个数组

Array(
Array(
10.75.194.100, -, -, [06/Jun/2016:14:35:32, +0800], “GET, /video.flv, HTTP/1.1”, 206, 2393100, “http://www.tenglongdesign.com/“, “Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”, “http://o7o80z735.bkt.clouddn.com“, V1), Array(10.75.194.100, -, -, [06/Jun/2016:14:35:27, +0800], “GET, /video.flv, HTTP/1.1”, 403, 680, “http://www.tenglongdesign.com/“, “Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/38.0.2125.122+Safari/537.36+SE+2.X+MetaSr+1.0”, “http://o7o80z735.bkt.clouddn.com“, V1
),
Array(
10.75.194.100, -, -, [06/Jun/2016:14:35:32, +0800], “GET, /video.flv, HTTP/1.1”, 403, 680, “http://www.tenglongdesign.com/“, “Mozilla/5.0+(Windows…
)
)

spark分析例子

sc.textFile(“/Users/huxicong/Downloads/datas”).map(_.split(” “)).map(x=>(x(6),1)).reduceByKey((x,y)=>(x+y)).map(x=>(x._2,x._1)).sortByKey(false).take(5)

函数分析

函数分析
map(x=>(x(6),1))这个数据是将每个数转成(a,1)
reduceByKey((x,y)=>(x+y))通过可以相同的数据进行相加
map(x=>(x,_ 2,x,_ 1))将一个数组key value对调位置
sortByKey(false)通过key进行排序
take(5)是获取前5条数据
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值