Hama: Design of Graph Module

本文介绍了Apache Hama中的图计算模块,该模块基于简单的编程接口实现Google Pregel风格的应用程序。文章详细阐述了图计算API的内部实现,包括VertexInputReader、GraphJob和GraphJobRunner三个核心组件的作用及工作流程。

http://wiki.apache.org/hama/GraphModuleInternals

 

Hama includes the Graph module for vertex-centric graph computations. Hama's Graph APIs allows you to program Google's Pregel style applications with simple programming interface.

 

Internals

 

The Graph APIs are implemented on top of Hama BSP framework. It consists of three major classes: VertexInputReader, GraphJob, and GraphJobRunner.

  • VertexInputReader: it is used for parsing and extracting the Vertex structure from arbitrary text and binary data.

  • GraphJob: the primary interface for a user to describe a Graph job to the Hama BSP framework for execution.

  • GraphJobRunner: the BSP program for performing the Vertex's compute() method.

 

VertexInputReader

 

The VertexInputReader is the user-defined interface for parsing and extracting the Vertex structure from arbitrary text and binary data. Internally, the loadVertices() method reads the records from assigned split, and then loads the converted Vertex objects by the user-defined VertexInputReader.parseVertex() method into memory Vertices storage.

 

GraphJob

 

GraphJob provides some additional Get/Set methods extending the core BSPJob interface for supporting the Graph specific configurations, such as setMaxIteration, setAggregatorClass, setVertexInputReaderClass, and setVertexOutputWriterClass. Rest APIs e.g., InputFormat, OutputFormat etc. are the same with core BSPJob interface.

 

GraphJobRunner

 

The GraphJobRunner is the core internal BSP program which is performs vertex computations as defined in Vertex.compute() method, and creates output. It, like other BSP programs, consists of three methods: setup(), cleanup(), and bsp().

  • setup() phase: the initialization phase for vertex computations.
  • bsp() phase: the main computations of the vertices. The message communications among vertices are also handled by BSP communication interface in this phase.
  • cleanup() phase: output write phase after completing the computations of the vertices.

More specifically, below two core methods loadVertices and doSuperstep() are used for loading and processing vertices.

 

loadVertices

 

As you can guess, the loadVertices() is in the setup() initialization phase. It reads assigned split data, parses and loads Vertex into VerticesInfo. The current implementation of Vertex computations assumes that Vertices are already sorted by vertexID, for processing memory-efficiently.

 

doInitialSuperstep and doSuperstep

 

  • Work In Progress.

 

List of Future Ideas and Challenges

 

Currently, we use ListVerticesInfo and Collections.sort(vertices).

With improve of memory-based vertices storage, HBase's Scanner or disk-based vertices storage also should be considered in the future.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值