一、属性算子
属性算子包含mapVertices,mapEdges,mapTriplets,作用类似于RDD的map操作
//操作顶点属性
def mapVertices[VD2](map: (VertexId, VD) => VD2): Graph[VD2, ED]
//操作边属性
def mapEdges[ED2](map: Edge[ED] => ED2): Graph[VD, ED2]
//操作整个三元组
def mapTriplets[ED2](map: EdgeTriplet[VD, ED] => ED2): Graph[VD, ED2]
val p = sc.parallelize(Array((1L,("Alice",28)),(2L,("Bob",27)),(3L,("Charlie",65)),(4L,("David",42)),(5L,("Ed",55)),(6L,("Fran",50))))
val re = sc.parallelize(Array(Edge(2L,1L,7),Edge(2L,4L,2),Edge(3L,2L,4),Edge(3L,6L,3),Edge(4L,1L,1),Edge(5L,2L,2),Edge(5L,3L,8),Edge(5L,6L,3)))
val graph=Graph(p,re)
//修改vertices结构
val gr =
graph.mapVertices{ case(id,(name,age)) => (id,age) }
//gr 与 gr2 效果想同
val gr2 =
graph.mapVertices{case(id,attr)=>(id,attr._1)}
gr.vertices.collect.foreach(println)
输出:
(4,(4,David))
(1,(1,Alice))
(6,(6,Fran))
(3,(3,Charlie))
(5,(5,Ed))
(2,(2,Bob))
//修改edges
val gr3 =
graph.mapEdges(x=>Edge(x.srcId,x.dstId,x.attr*2))
gr3.edges.collect.foreach(println)
输出:
Edge(2,1,Edge(2,1,14))
Edge(2,4,Edge(2,4,4))
Edge(3,2,Edge(3,2,8))
Edge(3,6,Edge(3,6,6))
Edge(4,1,Edge(4,1,2))
Edge(5,2,Edge(5,2,4))
Edge(5,3,Edge(5,3,16))
Edge(5,6,Edge(5,6,6))
二、结构算子
//反转关系
def reverse: Graph[VD, ED]
//生成子图
def subgraph(epred: EdgeTriplet[VD,ED] => Boolean,
vpred: (VertexId, VD) => Boolean): Graph[VD, ED]
案例:
//reverse反转关系
val gr4 = graph.reverse
gr4.edges.collect.foreach(println)
输出:
Edge(1,2,7)
Edge(1,4,1)
Edge(2,3,4)
Edge(2,5,2)
Edge(3,5,8)
Edge(4,2,2)
Edge(6,3,3)
Edge(6,5,3)
//subgraph-vpred
graph.subgraph(vpred=(id,t)=>t._2<65).triplets.collect.foreach(println)
输出:
((2,(Bob,27)),(1,(Alice,28)),7)
((2,(Bob,27)),(4,(David,42)),2)
((4,(David,42)),(1,(Alice,28)),1)
((5,(Ed,55)),(2,(Bob,27)),2)
((5,(Ed,55)),(6,(Fran,50)),3)
//subgraph-epred
graph.subgraph(epred(ep)=>ep.srcAttr._2<65).triplets.collect.foreach(println)
输出:
((2,(Bob,27)),(1,(Alice,28)),7)
((2,(Bob,27)),(4,(David,42)),2)
((4,(David,42)),(1,(Alice,28)),1)
((5,(Ed,55)),(2,(Bob,27)),2)
((5,(Ed,55)),(3,(Charlie,65)),8)
((5,(Ed,55)),(6,(Fran,50)),3)
三、join算子
从外部的RDDs加载数据,修改顶点属性
def joinVertices[U](table: RDD[(VertexId, U)])(map: (VertexId, VD, U) => VD): Graph[VD, ED]
//Rdd中的顶点不匹配时,值为None
def outerJoinVertices[U, VD2](table: RDD[(VertexId, U)])(map: (VertexId, VD, Option[U]) => VD2)
: Graph[VD2, ED]
案例1:名字后拼接邮箱
val t =
sc.makeRDD(Array((1L,"qq.com"),(2L,"163.com"),(3L,"gmail.com")))
//joinVertices 只join有的
val g1 =
graph.joinVertices(t)((id,v,cmpy)=>(v._1+"@"+cmpy,v._2))
//outerJoinVertices 没有的会用none补充
val g2 =
graph.outerJoinVertices(t)((id,v,cmpy)=>(v._1+"@"+cmpy,v._2))
g1.triplets.collect.foreach(println)
输出:
((2,(Bob@163.com,27)),(1,(Alice@qq.com,28)),7)
((2,(Bob@163.com,27)),(4,(David,42)),2)
((3,(Charlie@gmail.com,65)),(2,(Bob@163.com,27)),4)
((3,(Charlie@gmail.com,65)),(6,(Fran,50)),3)
((4,(David,42)),(1,(Alice@qq.com,28)),1)
((5,(Ed,55)),(2,(Bob@163.com,27)),2)
((5,(Ed,55)),(3,(Charlie@gmail.com,65)),8)
((5,(Ed,55)),(6,(Fran,50)),3)
g2.triplets.collect.foreach(println)
输出:
((2,(27@Some(163.com),27)),(1,(28@Some(qq.com),28)),7)
((2,(27@Some(163.com),27)),(4,(42@None,42)),2)
((3,(65@Some(gmail.com),65)),(2,(27@Some(163.com),27)),4)
((3,(65@Some(gmail.com),65)),(6,(50@None,50)),3)
((4,(42@None,42)),(1,(28@Some(qq.com),28)),1)
((5,(55@None,55)),(2,(27@Some(163.com),27)),2)
((5,(55@None,55)),(3,(65@Some(gmail.com),65)),8)
((5,(55@None,55)),(6,(50@None,50)),3)
案例2:统计每个用户的点赞和获赞个数
case class user(name:String,age:Int,inDeg:Int,outDeg:Int)
val g =
graph.outerJoinVertices(graph.inDegrees){
case(id,u,indeg)=>user(u._1,u._2,indeg.getOrElse(0),0)
}.outerJoinVertices(graph.outDegrees){
case (id,u,outdeg)=>user(u.name,u.age,u.inDeg,outdeg.getOrElse(0))}
g.vertices.collect.foreach(println)
输出:
(4,user(David,42,1,1))
(1,user(Alice,28,2,0))
(6,user(Fran,50,2,0))
(3,user(Charlie,65,1,2))
(5,user(Ed,55,0,3))
(2,user(Bob,27,2,2))
本文介绍了图计算中的三种核心算子:属性算子、结构算子和join算子,并通过实例展示了如何使用这些算子来操作顶点和边的属性、反转边的关系、生成子图以及如何将外部RDD的数据加入到图中。
1926

被折叠的 条评论
为什么被折叠?



