算法介绍
PathPlanning是基于 Spark Graphx 中的 Pregel 机制实现的算法。关于Pregel机制的理解可参考【大数据分析】基于Graphx的shortestpath源码解析。PathPlanning可以在有限的迭代次数内尽可能多地计算出图数据中起始点 S S S 到目标点 T T T 的所有路径。
算法解析
数据的准备
创建样例数据的代码
val myVertices = sc.makeRDD(Array(
(1L, "Dave"),
(2L, "Faith"),
(3L, "Harvey"),
(4L, "Bob"),
(5L, "Alice"),
(6L, "Charlie"),
(7L, "George"),
(8L, "Ivy")
))
val myEdges = sc.makeRDD(Array(
Edge(7L, 1L, "friend"),
Edge(7L, 2L, "sister"),
Edge(7L, 6L, "friend"),
Edge(1L, 4L, "friend"),
Edge(4L, 1L, "brother"),
Edge(3L, 2L, "boss"),
Edge(2L, 3L, "client"),
Edge(2L, 4L, "client"),
Edge(1L, 5L, "client"),
Edge(4L, 5L, "coworker"),
Edge(3L, 8L, "coworker"),
Edge(5L, 8L, "father"),
Edge(4L, 8L, "colleague")
))
样例数据的示意图,我们这次的S是1L,T是8L)如下:

关于图算法需要思考的问题
基于Pregel机制实现的算法,一般需要考虑几个问题
(1)如何定义节点的属性结构?
(2)如何初始化节点的属性?
(3)在进行第一次迭代前,如何激活所有的节点?
(4)如何传递消息(节点状态如何变化,消息传递的方向,如何进行消息的更新)
(5)接收到多个消息如何将它们进行组合(merge)
(6)最终接收到的消息如何与当前节点的属性组合(vertex_program)
定义节点的属性类型
代码中定义了两个case class类型和一个MsgValue类型
/**
* @desc The path instance type including the definition of length of path and the definition of path
*/
case class PathInstance(l: Double, p: List[VertexId])
/**
* @param dstId Used to store S's ID
* @param pathInstances An ArrayBuffer that used to store a path that combined with a list of nodes
*/
case class MsgValue(dstId: VertexId, pathInstances: ArrayBuffer[PathInstance])
type MsgType = Map[VertexId, MsgValue]
(1)PathInstance 代表路径实例类型,一个路径实例包含路径 p ,以及对应的路径长度 l 。
(2)MsgValue 包含两个变量,dstId 和 pathInstances 。dstId 是算法需要计算的目标节点 T 。pathInstances 是路径列表。
(3)MsgType是每个节点的属性类,它是一个Map。其中key是VertexId,是 B 的入度边相邻节点,如:A
→
\rightarrow
→B中的 A 。换言之,节点
B
B
B 存储着
A
A
A 的节点 ID(作为key),以及
A
A
A 的所有的路径(
A
A
A 的所有路径加上
B
B
B的ID)。
节点属性的初始化
val PGraph = graph.mapVertices { (vid, attr) =>
if (vid == source) {
val dstId = target
val pi = PathInstance(l = 0, p = List(vid))
val pis = new ArrayBuffer[PathInstance]()
pis += pi
val msgValue = MsgValue(dstId = dstId, pathInstances = pis)
makeMsg(vid -> msgValue)
} else {
makeMsg()
}
}
结果如下图所示。

初始化激活所有的节点

激活所以节点需要一个初始消息,这里是一个仅有一个元素的Map
val initialMessage = makeMsg(-1L -> null)
然后激活所有节点时,会直接触发vertexProgram方法。initialMessage 会作为msg参数传入。
消息的传递
消息的传递由sendMsg确定。假设将所有的triplet定义为:A → \rightarrow → B,消息的传递算法如下所示
1、根据节点的激活和休眠进行三元组的筛选
筛选出 A 或 B 处于激活态的三元组 A
→
\rightarrow
→ B。
2、消息的构建
将A中处于不同key的路径实例收集在一起,其中路径长度+边的权重,路径末尾加上B。
3、是否产生消息传递
对于 A 或 B 是出于激活态的 A
→
\rightarrow
→ B ,有以下几种情况,不发生消息传递。
(1)A中的Map没有元素。
(2)2 构建的消息中,其路径存在重复的节点。
(3)A 是我们需要计算的目标点 S
(4)2 构建的消息,与 B完全相同。
对于(1)和(3)如下图,2,3,4,5,6,7 作为 A不会与对应的B发生消息传递。8既T,也作为A时,不发生消息传递。

而对于(2)和(4)如下图,4
→
\rightarrow
→ 1 ,路径会出现重复节点,满足(2),1
→
\rightarrow
→ 4 和 1
→
\rightarrow
→ 5满足(4)

4、消息传递的方向
消息传递的方向从总体上是从A到B,即将(3)构建的消息发送给
B
B
B 。
消息与消息的合并
已知两个三元组 1 → 4 1\rightarrow 4 1→4 和 2 → 4 2\rightarrow 4 2→4,它们都会将消息发给节点4,而在节点4收到消息前,两个消息需要合并,mergeMsg负责合并两条消息。这里mergeMsg直接将两个消息合并,因为它们的key互不相同,并且都只有一个元素。
消息与属性的合并
vertexProgram负责将合并后的消息和当前接收消息的节点的属性进一步合并,假设当前属性为attr,msg是接受到的消息,attr直接被替换为msg。
算法迭代完成过程

代码
package com.edata.bigdata.algorithm.networks.approximation
import org.apache.spark.graphx.{EdgeTriplet, Graph, Pregel, VertexId}
import scala.collection.mutable.ArrayBuffer
import scala.reflect.ClassTag
/**
* @Description: Suppose that source node S, and target node T
* Calculate the possible paths and the corresponding lengths from the source node S to the target node T.
* This result of this algorithm is only an estimate. As the iteration continue,more and more correct result will be calculated.
* @Author @Author: Alan Sword
* @Date 10:42
* @Version 1.0
* */
object PathPlanning extends Serializable {
/**
* @desc The path instance type including the definition of length of path and the definition of path
*/
case class PathInstance(l: Double, p: List[VertexId])
/**
* @param dstId Used to store S's ID
* @param pathInstances An ArrayBuffer that used to store a path that combined with a list of nodes
*/
case class MsgValue(dstId: VertexId, pathInstances: ArrayBuffer[PathInstance])
type MsgType = Map[VertexId, MsgValue]
/**
* @param x a 'key -> value' type element.
* @return a 'MsgType' Map.
* @Description Use parameter 'x' to create a vertex's attribute,where 'x' is the neighbor of current node
*/
private def makeMsg(x: (VertexId, MsgValue)*) = Map(x: _*)
/**
* @param edge a edge triplet (A->B).
* @return true or false.
* @Description Use A's attribute to create a new attribute
*/
private def updateMsgValue(edge: EdgeTriplet[MsgType, _]): MsgValue = {
val edgeDstId = edge.dstId
val edgeWeight = 1
val dstId = edge.srcAttr.values.map(data => data.dstId).reduce((x, y) => x)
val pathInstances = edge.srcAttr.values.map(data => data.pathInstances).reduce((x, y) => x ++= y).map(pi => {
val l = pi.l + edgeWeight
val p = pi.p :+ edgeDstId
PathInstance(l, p)
}).distinct
MsgValue(dstId, pathInstances)
}
/**
*
* @param msg1 a message from a neighbor.
* @param msg2 a message from another neighbor.
* @return a 'MsgType' Map that is made from a combination of msg1 and msg2.
* @Description
*/
private def mergeMsg(msg1: MsgType, msg2: MsgType): MsgType = {
msg1 ++ msg2
}
/**
* @param id The vertex id of the node that receives message.
* @param attr The attribute of node N.
* @param msg The Message after 'mergeMsg'.
* @return attr or msg.
* @Description Updates the attribute of node N.
*/
def vertexProgram(vid: VertexId, attr: MsgType, msg: MsgType): MsgType = {
if (msg.keySet.contains(-1L)) {
attr
} else {
val attr_msg = (attr.keySet ++ msg.keySet).map {
k => k -> msg.getOrElse(k, attr.getOrElse(k, null))
}.toMap
attr_msg
}
}
/**
* @param edge a edge triplet (A->B).
* @return
* @Description Send message in 'Iterator[(VertexId,MsgType)]' format between node and node.
*/
private def sendMsg(edge: EdgeTriplet[MsgType, _]): Iterator[(VertexId, MsgType)] = {
//
if (edge.srcAttr.isEmpty) return Iterator.empty
val msg_value_new = updateMsgValue(edge)
val path_instances_new = msg_value_new.pathInstances
if (path_instances_new.exists(p => p.p.distinct.length < p.p.length)) return Iterator.empty
val srcId = edge.srcId
val msg_value_dst = edge.dstAttr.getOrElse(srcId, MsgValue(dstId = 0, pathInstances = new ArrayBuffer[PathInstance]()))
if (msg_value_dst.dstId == srcId) return Iterator.empty
val path_instances_dst = msg_value_dst.pathInstances
if (path_instances_new.containsSlice(path_instances_dst) && path_instances_dst.containsSlice(path_instances_new))
return Iterator.empty
val msg = makeMsg(edge.srcId -> msg_value_new)
Iterator((edge.dstId, msg))
}
/**
*
* @param graph The graph that needs to be calculated
* @param source The starting point
* @param target The ending point
* @tparam VD
* @tparam ED
* @Description
*/
def run[VD, ED: ClassTag](graph: Graph[VD, ED], source: VertexId, target: VertexId,maxIterations:Int): Graph[MsgType, ED] = {
val PGraph = graph.mapVertices { (vid, attr) =>
if (vid == source) {
val dstId = target
val pi = PathInstance(l = 0, p = List(vid))
val pis = new ArrayBuffer[PathInstance]()
pis += pi
val msgValue = MsgValue(dstId = dstId, pathInstances = pis)
makeMsg(vid -> msgValue)
} else {
makeMsg()
}
}
val initialMessage = makeMsg(-1L -> null)
Pregel(PGraph, initialMessage, maxIterations = maxIterations)(vertexProgram, sendMsg, mergeMsg)
}
}
运行结果如下,
(6,Map(7 -> MsgValue(7,ArrayBuffer(PathInstance(1.0,List(7, 6))))))
(2,Map(7 -> MsgValue(7,ArrayBuffer(PathInstance(1.0,List(7, 2)))), 3 -> MsgValue(7,ArrayBuffer())))
(8,Map(3 -> MsgValue(7,ArrayBuffer(PathInstance(3.0,List(7, 2, 3, 8)))), 5 -> MsgValue(7,ArrayBuffer(PathInstance(4.0,List(7, 1, 4, 5, 8)), PathInstance(4.0,List(7, 2, 4, 5, 8)))), 4 -> MsgValue(7,ArrayBuffer(PathInstance(3.0,List(7, 1, 4, 8)), PathInstance(3.0,List(7, 2, 4, 8))))))
(7,Map(7 -> MsgValue(7,ArrayBuffer(PathInstance(0.0,List(7))))))
(4,Map(1 -> MsgValue(7,ArrayBuffer(PathInstance(2.0,List(7, 1, 4)))), 2 -> MsgValue(7,ArrayBuffer(PathInstance(2.0,List(7, 2, 4))))))
(3,Map(2 -> MsgValue(7,ArrayBuffer(PathInstance(2.0,List(7, 2, 3))))))
(1,Map(7 -> MsgValue(7,ArrayBuffer(PathInstance(1.0,List(7, 1))))))
(5,Map(4 -> MsgValue(7,ArrayBuffer(PathInstance(3.0,List(7, 1, 4, 5)), PathInstance(3.0,List(7, 2, 4, 5))))))
本文介绍了基于Spark Graphx的PathPlanning算法,该算法利用Pregel机制在有限迭代内查找图数据中起点S到目标点T的所有路径。内容包括数据准备、节点属性定义、初始化、消息传递与合并等关键步骤的详细解析,并提供了代码示例。
725

被折叠的 条评论
为什么被折叠?



