GiraphV1.2之DiskMessage 运行设置

本文介绍如何在Giraph V1.2中进行Out-of-Core (OOC)优化,包括解决通信机制不匹配的问题及避免内存溢出错误。通过调整配置确保两端使用相同的流控制策略,并适当增加worker数量,最终成功运行PageRank计算任务。


V1.2中针对OOC做了特别优化。


首先要设置选项:

<property>
 <name>giraph.useOutOfCoreGraph</name>
 <value>true</value>
</property>

这里易出现的问题是, 关于flowcontrol包中,Server端的配置默认是No_OP, 但是Worker端默认是CreditBasedFlowControl, 这样两边通信时由于response产生的机制不同,导致生成的responseId不同, 报错,,因此需要设置:

<property>
 <name>giraph.waitForPerWorkerRequests</name>
 <value>true</value>
</property>

确保两边都是CreditBasedFlowControl。

此外,如果设定的worker数太少,会报内存oom错误!

java.lang.IllegalStateException: Exception occurred
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:274)
	at org.apache.giraph.graph.GraphTaskManager.processGraphPartitions(GraphTaskManager.java:821)
	at org.apache.giraph.graph.GraphTaskManager.execute(GraphTaskManager.java:365)
	at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:92)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
	at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.util.concurrent.FutureTask.get(FutureTask.java:202)
	at org.apache.giraph.utils.ProgressableUtils.getResultsWithNCallables(ProgressableUtils.java:271)
	... 10 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:82)
	at org.apache.giraph.utils.UnsafeByteArrayOutputStream.<init>(UnsafeByteArrayOutputStream.java:73)
	at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createExtendedDataOutput(ImmutableClassesGiraphConfiguration.java:1188)
	at org.apache.giraph.utils.io.ExtendedDataInputOutput.<init>(ExtendedDataInputOutput.java:47)
	at org.apache.giraph.conf.ImmutableClassesGiraphConfiguration.createMessagesInputOutput(ImmutableClassesGiraphConfiguration.java:1177)
	at org.apache.giraph.comm.messages.primitives.IntByteArrayMessageStore.getDataInputOutput(IntByteArrayMessageStore.java:124)
	at org.apache.giraph.comm.messages.primitives.IntByteArrayMessageStore.addPartitionMessages(IntByteArrayMessageStore.java:181)
	at org.apache.giraph.ooc.data.DiskBackedMessageStore.addEntryToInMemoryPartitionData(DiskBackedMessageStore.java:283)
	at org.apache.giraph.ooc.data.DiskBackedMessageStore.addEntryToInMemoryPartitionData(DiskBackedMessageStore.java:1)
	at org.apache.giraph.ooc.data.DiskBackedDataStore.addEntry(DiskBackedDataStore.java:200)
	at org.apache.giraph.ooc.data.DiskBackedMessageStore.addPartitionMessages(DiskBackedMessageStore.java:136)
	at org.apache.giraph.comm.requests.SendWorkerMessagesRequest.doRequest(SendWorkerMessagesRequest.java:94)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.doRequest(NettyWorkerClientRequestProcessor.java:472)
	at org.apache.giraph.comm.SendMessageCache.flush(SendMessageCache.java:257)
	at org.apache.giraph.comm.netty.NettyWorkerClientRequestProcessor.flush(NettyWorkerClientRequestProcessor.java:404)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:253)
	at org.apache.giraph.graph.ComputeCallable.call(ComputeCallable.java:1)
	at org.apache.giraph.utils.LogStacktraceCallable.call(LogStacktraceCallable.java:67)
	at java.util.concurrent.FutureTask.run(FutureTask.java:262)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
	at java.lang.Thread.run(Thread.java:745)

可以看出,是由于接收本地消息时内存不足造成的。


设置后执行命令:

 giraph ../giraph-core-1.2.0.jar  org.apache.giraph.benchmark.PageRankComputation -vif  org.apache.giraph.io.formats.IntFloatNullTextInputFormat -vip /test/youTube.txt  -vof org.apache.giraph.io.formats.IdWithValueTextOutputFormat -op /output  -w 3

结果:

No HADOOP_CONF_DIR set, using /opt/hadoop-1.2.1/conf 
16/12/12 00:02:26 INFO utils.ConfigurationUtils: No edge input format specified. Ensure your InputFormat does not require one.
16/12/12 00:02:26 INFO utils.ConfigurationUtils: No edge output format specified. Ensure your OutputFormat does not require one.
16/12/12 00:02:27 INFO job.GiraphJob: run: Since checkpointing is disabled (default), do not allow any task retries (setting mapred.map.max.attempts = 1, old value = 4)
16/12/12 00:02:32 INFO job.GiraphJob: Tracking URL: http://mu02:50030/jobdetails.jsp?jobid=job_201612092054_0044
16/12/12 00:02:32 INFO job.GiraphJob: Waiting for resources... Job will start only when it gets all 4 mappers
16/12/12 00:03:19 INFO job.HaltApplicationUtils$DefaultHaltInstructionsWriter: writeHaltInstructions: To halt after next superstep execute: 'bin/halt-application --zkServer c02b13:22181 --zkNode /_hadoopBsp/job_201612092054_0044/_haltComputation'
16/12/12 00:03:19 INFO mapred.JobClient: Running job: job_201612092054_0044
16/12/12 00:03:20 INFO mapred.JobClient:  map 100% reduce 0%
16/12/12 00:03:34 INFO mapred.JobClient: Job complete: job_201612092054_0044
16/12/12 00:03:34 INFO mapred.JobClient: Counters: 47
16/12/12 00:03:34 INFO mapred.JobClient:   Zookeeper halt node
16/12/12 00:03:34 INFO mapred.JobClient:     /_hadoopBsp/job_201612092054_0044/_haltComputation=0
16/12/12 00:03:34 INFO mapred.JobClient:   Zookeeper base path
16/12/12 00:03:34 INFO mapred.JobClient:     /_hadoopBsp/job_201612092054_0044=0
16/12/12 00:03:34 INFO mapred.JobClient:   Job Counters 
16/12/12 00:03:34 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=135763
16/12/12 00:03:34 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Launched map tasks=4
16/12/12 00:03:34 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=0
16/12/12 00:03:34 INFO mapred.JobClient:   Giraph Timers
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 5 PageRankComputation (ms)=1325
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 0 PageRankComputation (ms)=1086
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 3 PageRankComputation (ms)=1685
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 1 PageRankComputation (ms)=2478
16/12/12 00:03:34 INFO mapred.JobClient:     Input superstep (ms)=5188
16/12/12 00:03:34 INFO mapred.JobClient:     Total (ms)=26390
16/12/12 00:03:34 INFO mapred.JobClient:     Shutdown (ms)=10014
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 4 PageRankComputation (ms)=2016
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 2 PageRankComputation (ms)=1958
16/12/12 00:03:34 INFO mapred.JobClient:     Initialize (ms)=14028
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep 6 PageRankComputation (ms)=567
16/12/12 00:03:34 INFO mapred.JobClient:     Setup (ms)=69
16/12/12 00:03:34 INFO mapred.JobClient:   Zookeeper server:port
16/12/12 00:03:34 INFO mapred.JobClient:     c02b13:22181=0
16/12/12 00:03:34 INFO mapred.JobClient:   Giraph Stats
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate bytes loaded from local disks (out-of-core)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Sent message bytes=0
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate bytes stored to local disks (out-of-core)=0
16/12/12 00:03:34 INFO mapred.JobClient:     Current workers=3
16/12/12 00:03:34 INFO mapred.JobClient:     Last checkpointed superstep=0
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate sent messages=17925744
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate finished vertices=1134890
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate vertices=1134890
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate edges=2987624
16/12/12 00:03:34 INFO mapred.JobClient:     Superstep=7
16/12/12 00:03:34 INFO mapred.JobClient:     Aggregate sent message bytes=143419884
16/12/12 00:03:34 INFO mapred.JobClient:     Current master task partition=0
16/12/12 00:03:34 INFO mapred.JobClient:     Sent messages=0
16/12/12 00:03:34 INFO mapred.JobClient:     Lowest percentage of graph in memory so far (out-of-core)=100
16/12/12 00:03:34 INFO mapred.JobClient:   File Output Format Counters 
16/12/12 00:03:34 INFO mapred.JobClient:     Bytes Written=0
16/12/12 00:03:34 INFO mapred.JobClient:   FileSystemCounters
16/12/12 00:03:34 INFO mapred.JobClient:     HDFS_BYTES_READ=29531257
16/12/12 00:03:34 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=490099
16/12/12 00:03:34 INFO mapred.JobClient:     HDFS_BYTES_WRITTEN=20394140
16/12/12 00:03:34 INFO mapred.JobClient:   File Input Format Counters 
16/12/12 00:03:34 INFO mapred.JobClient:     Bytes Read=0
16/12/12 00:03:34 INFO mapred.JobClient:   Map-Reduce Framework
16/12/12 00:03:34 INFO mapred.JobClient:     Map input records=4
16/12/12 00:03:34 INFO mapred.JobClient:     Physical memory (bytes) snapshot=1087893504
16/12/12 00:03:34 INFO mapred.JobClient:     Spilled Records=0
16/12/12 00:03:34 INFO mapred.JobClient:     CPU time spent (ms)=184690
16/12/12 00:03:34 INFO mapred.JobClient:     Total committed heap usage (bytes)=722993152
16/12/12 00:03:34 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=3353665536
16/12/12 00:03:34 INFO mapred.JobClient:     Map output records=0
16/12/12 00:03:34 INFO mapred.JobClient:     SPLIT_RAW_BYTES=176


可以观察到,由于数据写磁盘的缘故,每轮超步的执行时间都比较长!

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值