hadoop archive归并文档使用及测试。
archive用法:
[hadoop@hadoop9 jars]$ hadoop archive --help
archive -archiveName NAME -p <parent path> <src>* <dest>
archive归并文件的时候会启动一个M/R Job任务来执行,如下:
[hadoop@hadoop9 jars]$ hadoop archive -archiveName tables.har -p /root/src/(/*此处有72G的数据,分散在多个目录中*/) /root/dst/
12/05/15 10:23:03 INFO mapred.JobClient: map 0% reduce 0%
12/05/15 10:23:13 INFO mapred.JobClient: map 40% reduce 0%
12/05/15 10:23:14 INFO mapred.JobClient: map 50% reduce 0%
12/05/15 10:23:19 INFO mapred.JobClient: map 51% reduce 0%
12/05/15 10:23:40 INFO mapred.JobClient: map 52% reduce 0%
12/05/15 10:24:04 INFO mapred.JobClient: map 53% reduce 0%
12/05/15 10:24:28 INFO mapred.JobClient: map 55% reduce 0%
12/05/15 10:24:34 INFO mapred.JobClient: map 57% reduce 0%
12/05/15 10:24:47 INFO mapred.JobClient: map 58% reduce 0%
12/05/15 10:24:53 INFO mapred.JobClient: map 59% reduce 0%
12/05/15 10:25:02 INFO mapred.JobClient: map 60% reduce 0%
12/05/15 10:25:13 INFO mapred.JobClient: map 61% reduce 0%
12/05/15 10:25:17 INFO mapred.JobClient: map 62% reduce 0%
12/05/15 10:25:23 INFO mapred.JobClient: map 63% reduce 0%
12/05/15 10:25:48 INFO mapred.JobClient: map 64% reduce 0%
12/05/15 10:25:59 INFO mapred.JobClient: map 65% reduce 0%
12/05/15 10:26:27 INFO mapred.JobClient: map 67% reduce 0%
12/05/15 10:26:36 INFO mapred.JobClient: map 68% reduce 0%
12/05/15 10:26:38 INFO mapred.JobClient: map 69% reduce 0%
12/05/15 10:26:41 INFO mapred.JobClient: map 70% reduce 0%
12/05/15 10:26:47 INFO mapred.JobClient: map 71% reduce 0%
12/05/15 10:27:02 INFO mapred.JobClient: map 73% reduce 0%
12/05/15 10:27:13 INFO mapred.JobClient: map 74% reduce 0%
12/05/15 10:27:56 INFO mapred.JobClient: map 75% reduce 0%
12/05/15 10:28:24 INFO mapred.JobClient: map 76% reduce 0%
12/05/15 10:28:30 INFO mapred.JobClient: map 77% reduce 0%
12/05/15 10:28:46 INFO mapred.JobClient: map 78% reduce 0%
12/05/15 10:29:16 INFO mapred.JobClient: map 79% reduce 0%
12/05/15 10:29:21 INFO mapred.JobClient: map 81% reduce 0%
12/05/15 10:29:24 INFO mapred.JobClient: map 82% reduce 0%
12/05/15 10:29:43 INFO mapred.JobClient: map 83% reduce 0%
12/05/15 10:29:48 INFO mapred.JobClient: map 84% reduce 0%
12/05/15 10:30:03 INFO mapred.JobClient: map 87% reduce 0%
12/05/15 10:30:15 INFO mapred.JobClient: map 89% reduce 0%
12/05/15 10:30:34 INFO mapred.JobClient: map 90% reduce 0%
12/05/15 10:30:42 INFO mapred.JobClient: map 91% reduce 0%
12/05/15 10:30:55 INFO mapred.JobClient: map 92% reduce 0%
12/05/15 10:31:01 INFO mapred.JobClient: map 94% reduce 0%
12/05/15 10:31:06 INFO mapred.JobClient: map 95% reduce 0%
12/05/15 10:31:10 INFO mapred.JobClient: map 96% reduce 0%
12/05/15 10:31:18 INFO mapred.JobClient: map 97% reduce 0%
12/05/15 10:31:31 INFO mapred.JobClient: map 98% reduce 0%
12/05/15 10:31:52 INFO mapred.JobClient: map 99% reduce 0%
12/05/15 10:32:31 INFO mapred.JobClient: map 100% reduce 0%
12/05/15 10:32:41 INFO mapred.JobClient: map 100% reduce 8%
12/05/15 10:32:44 INFO mapred.JobClient: map 100% reduce 25%
12/05/15 10:32:56 INFO mapred.JobClient: map 100% reduce 26%
12/05/15 10:33:26 INFO mapred.JobClient: map 100% reduce 28%
12/05/15 10:33:41 INFO mapred.JobClient: map 100% reduce 31%
12/05/15 10:34:08 INFO mapred.JobClient: map 100% reduce 100%
12/05/15 10:34:08 INFO mapred.JobClient: Job complete: job_201205141650_0029
12/05/15 10:34:08 INFO mapred.JobClient: Counters: 26
12/05/15 10:34:08 INFO mapred.JobClient: Job Counters
12/05/15 10:34:08 INFO mapred.JobClient: Launched reduce tasks=1
12/05/15 10:34:08 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=14908205
12/05/15 10:34:08 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/05/15 10:34:08 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/05/15 10:34:08 INFO mapred.JobClient: Launched map tasks=29
12/05/15 10:34:08 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=94163
12/05/15 10:34:08 INFO mapred.JobClient: FileSystemCounters
12/05/15 10:34:08 INFO mapred.JobClient: FILE_BYTES_READ=158845
12/05/15 10:34:08 INFO mapred.JobClient: HDFS_BYTES_READ=73761329524
12/05/15 10:34:08 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1953067
12/05/15 10:34:08 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=73761294738
12/05/15 10:34:08 INFO mapred.JobClient: Map-Reduce Framework
12/05/15 10:34:08 INFO mapred.JobClient: Map input records=1794
12/05/15 10:34:08 INFO mapred.JobClient: Reduce shuffle bytes=159013
12/05/15 10:34:08 INFO mapred.JobClient: Spilled Records=3588
12/05/15 10:34:08 INFO mapred.JobClient: Map output bytes=155025
12/05/15 10:34:08 INFO mapred.JobClient: CPU time spent (ms)=1150680
12/05/15 10:34:08 INFO mapred.JobClient: Total committed heap usage (bytes)=55161126912
12/05/15 10:34:08 INFO mapred.JobClient: Map input bytes=172632
12/05/15 10:34:08 INFO mapred.JobClient: Combine input records=0
12/05/15 10:34:08 INFO mapred.JobClient: SPLIT_RAW_BYTES=4553
12/05/15 10:34:08 INFO mapred.JobClient: Reduce input records=1794
12/05/15 10:34:08 INFO mapred.JobClient: Reduce input groups=1794
12/05/15 10:34:08 INFO mapred.JobClient: Combine output records=0
12/05/15 10:34:08 INFO mapred.JobClient: Physical memory (bytes) snapshot=41620033536
12/05/15 10:34:08 INFO mapred.JobClient: Reduce output records=0
12/05/15 10:34:08 INFO mapred.JobClient: Virtual memory (bytes) snapshot=141252935680
12/05/15 10:34:08 INFO mapred.JobClient: Map output records=1794
[hadoop@hadoop9 jars]$
[root@hadoop9 jars]# hadoop fs -ls /root/dst/tables.har
Found 32 items
-rw-r--r-- 3 hadoop supergroup 0 2012-05-15 10:33 /root/dst/tables.har/_SUCCESS
-rw-r--r-- 5 hadoop supergroup 147623 2012-05-15 10:33 /root/dst/tables.har/_index
-rw-r--r-- 5 hadoop supergroup 61 2012-05-15 10:33 /root/dst/tables.har/_masterindex
-rw-r--r-- 3 hadoop supergroup 1863349328 2012-05-15 10:22 /root/dst/tables.har/part-0
-rw-r--r-- 3 hadoop supergroup 1819888235 2012-05-15 10:22 /root/dst/tables.har/part-1
-rw-r--r-- 3 hadoop supergroup 1890091347 2012-05-15 10:22 /root/dst/tables.har/part-10
-rw-r--r-- 3 hadoop supergroup 1697144739 2012-05-15 10:22 /root/dst/tables.har/part-11
-rw-r--r-- 3 hadoop supergroup 2148543927 2012-05-15 10:22 /root/dst/tables.har/part-12
-rw-r--r-- 3 hadoop supergroup 1636754927 2012-05-15 10:22 /root/dst/tables.har/part-13
-rw-r--r-- 3 hadoop supergroup 2105077448 2012-05-15 10:22 /root/dst/tables.har/part-14
-rw-r--r-- 3 hadoop supergroup 1951327272 2012-05-15 10:22 /root/dst/tables.har/part-15
-rw-r--r-- 3 hadoop supergroup 1905061075 2012-05-15 10:22 /root/dst/tables.har/part-16
-rw-r--r-- 3 hadoop supergroup 1538641658 2012-05-15 10:22 /root/dst/tables.har/part-17
-rw-r--r-- 3 hadoop supergroup 2786014449 2012-05-15 10:22 /root/dst/tables.har/part-18
-rw-r--r-- 3 hadoop supergroup 3087401727 2012-05-15 10:22 /root/dst/tables.har/part-19
-rw-r--r-- 3 hadoop supergroup 2166233369 2012-05-15 10:22 /root/dst/tables.har/part-2
-rw-r--r-- 3 hadoop supergroup 4154808891 2012-05-15 10:22 /root/dst/tables.har/part-20
-rw-r--r-- 3 hadoop supergroup 3504078648 2012-05-15 10:22 /root/dst/tables.har/part-21
-rw-r--r-- 3 hadoop supergroup 3107759125 2012-05-15 10:22 /root/dst/tables.har/part-22
-rw-r--r-- 3 hadoop supergroup 3872013220 2012-05-15 10:22 /root/dst/tables.har/part-23
-rw-r--r-- 3 hadoop supergroup 3706866633 2012-05-15 10:22 /root/dst/tables.har/part-24
-rw-r--r-- 3 hadoop supergroup 4699050623 2012-05-15 10:22 /root/dst/tables.har/part-25
-rw-r--r-- 3 hadoop supergroup 4932266111 2012-05-15 10:22 /root/dst/tables.har/part-26
-rw-r--r-- 3 hadoop supergroup 3548633205 2012-05-15 10:22 /root/dst/tables.har/part-27
-rw-r--r-- 3 hadoop supergroup 3532625555 2012-05-15 10:22 /root/dst/tables.har/part-28
-rw-r--r-- 3 hadoop supergroup 2080168915 2012-05-15 10:22 /root/dst/tables.har/part-3
-rw-r--r-- 3 hadoop supergroup 2093903549 2012-05-15 10:22 /root/dst/tables.har/part-4
-rw-r--r-- 3 hadoop supergroup 2028814169 2012-05-15 10:22 /root/dst/tables.har/part-5
-rw-r--r-- 3 hadoop supergroup 1534977869 2012-05-15 10:22 /root/dst/tables.har/part-6
-rw-r--r-- 3 hadoop supergroup 634753410 2012-05-15 10:22 /root/dst/tables.har/part-7
-rw-r--r-- 3 hadoop supergroup 2081953097 2012-05-15 10:22 /root/dst/tables.har/part-8
-rw-r--r-- 3 hadoop supergroup 1652944533 2012-05-15 10:22 /root/dst/tables.har/part-9
[root@hadoop9 jars]#
本文详细介绍了如何使用HadoopArchive将分布在多个目录中的大文件进行归并操作,并通过实例展示了归并过程中的Job任务执行情况及最终归并文件的结构。
2248

被折叠的 条评论
为什么被折叠?



