BSP-Apache HAMA-Graph运行(1)

本文详细介绍了Apache HAMA的BSP (Bulk Synchronous Parallel) 框架的工作流程,包括作业提交过程、数据加载机制以及数据发送方式。通过对GraphJob.submit()方法的分析,展示了如何将顶点数据加载到内存中,并通过BSPPeer.send()方法完成数据的发送。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Apache-HAMA框架图


1、由BSPJobClient实现作业提交,实现的方法是:GraphJob.submit()


submit的主要内容是VertexID、VertexValue、EdgeValue等信息。

2、数据的加载

由“Loads vertices into memory ofeach peerGraphJobRunner.loadVertices()”是通过通过GraphJobRunner.loadVertices()方法把parse好的顶点加载到each peer的内存中,loadVertices()方法的代码:

 private void loadVertices(
      BSPPeer<Writable, Writable, Writable, Writable, GraphJobMessage> peer)
      throws IOException, SyncException, InterruptedException {
    for (int i = 0; i < peer.getNumPeers(); i++) {
      partitionMessages.put(i, new GraphJobMessage());
    }
    VertexInputReader<Writable, Writable, V, E, M> reader = (VertexInputReader<Writable, Writable, V, E, M>) ReflectionUtils
        .newInstance(conf.getClass(Constants.RUNTIME_PARTITION_RECORDCONVERTER,
            VertexInputReader.class));
    ThreadPoolExecutor executor = (ThreadPoolExecutor) Executors
        .newCachedThreadPool();
    executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64));
    executor.setRejectedExecutionHandler(retryHandler);
    KeyValuePair<Writable, Writable> next = null;
    while ((next = peer.readNext()) != null) {
      Vertex<V, E, M> vertex = GraphJobRunner
          .<V, E, M> newVertexInstance(VERTEX_CLASS);
      boolean vertexFinished = false;
      try {
        vertexFinished = reader.parseVertex(next.getKey(), next.getValue(),
            vertex);
      } catch (Exception e) {
        throw new IOException("Parse exception occured: " + e);
      }
      if (!vertexFinished) {
        continue;
      }
      Runnable worker = new Parser(vertex);
      executor.execute(worker);
    }
    executor.shutdown();
    executor.awaitTermination(60, TimeUnit.SECONDS);
    Iterator<Entry<Integer, GraphJobMessage>> it;
    it = partitionMessages.entrySet().iterator();
    while (it.hasNext()) {
      Entry<Integer, GraphJobMessage> e = it.next();
      it.remove();
      GraphJobMessage msg = e.getValue();
      msg.setFlag(GraphJobMessage.PARTITION_FLAG);
      peer.send(getHostName(e.getKey()), msg);
    }
    peer.sync();
    executor = (ThreadPoolExecutor) Executors.newCachedThreadPool();
    executor.setMaximumPoolSize(conf.getInt(DEFAULT_THREAD_POOL_SIZE, 64));
    executor.setRejectedExecutionHandler(retryHandler);

    GraphJobMessage msg;
    while ((msg = peer.getCurrentMessage()) != null) {
      executor.execute(new AddVertex(msg));
    }
    executor.shutdown();
    executor.awaitTermination(60, TimeUnit.SECONDS);
    LOG.info(vertices.size() + " vertices are loaded into "
        + peer.getPeerName());
  }
把所有的vertex is finished,需要在GraphJobRunner.loadVertices()方法中把vertex的信息用ConcurrentHashMap进行分割后在节点之间进行发送

3、数据的发送

Send a data with a tag to another BSPSlave corresponding to hostname.Messages sent by this method are not guaranteed to be received in a sent order.说明:BSP -HAMAbsp信息的发送和接收并不是严格一一对应的。


Abstract baseclass that should contain all information and services needed for the concreteRPC subclasses. For example it manages how the queues are managed and itmaintains a cache for socket addresses.(抽象类AbstractMessageManager是节点进行所有信息和服务的RPC超类,主要是管理和维护GraphJobMessage信息在队列(queue)内的的操作)

进入messenger.send()进入,AbstractMessageManager类的send()方法:向outgoing中添加peerName和value(GraphJobMessage)信息,outgoingMessageManager.addMesssage(peerName,msg),peerName表示hostName of peer。



因此,从outgoingBundles.put(targetPeerAddress,bundle)outgoingBundles.get(targetPeerAddress).Add(msg)可以看出信息被put(进入)HashMap中,并没有把信息发出去。而且BSPPeer.send()发送消息就是把BSPPeer的name和GraphJobMessage信息加载到HashMap中,发送给其他的BSPPeer。


评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值