Hadoop 分布式存储分布式计算

本文详细记录了Hadoop集群的搭建过程,并通过WordCount程序演示了如何进行简单的文本统计分析。

http://hadoop.apache.org/core/docs/current/cluster_setup.html

http://www.ibm.com/developerworks/cn/opensource/os-cn-hadoop2/index.html

实际操作:

[@more@]
vi /etc/hosts
192.168.1.212 web02
192.168.1.214 lvs
192.168.1.215 nq
ssh-keygen -t dsa -P ' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
scp ~/.ssh/authorized_keys lvs:~/.ssh/
scp ~/.ssh/authorized_keys nv:~/.ssh/
vi conf/master
web02
vi conf/slave
lvs
nq
vi conf/hadoop-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. --&gt


fs.default.name
hdfs://192.168.1.212:9000


mapred.job.tracker
192.168.1.212:9001


dfs.name.dir
/backup/hadoop/name


dfs.replication
2


dfs.data.dir
/backup/hadoop/data
Determines where on the local filesystem an DFS data node
should store its blocks. If this is a comma-delimited list of directories,
then data will be stored in all named directories, typically on different devices.
Directories that do not exist are ignored.

hadoop.tmp.dir
/backup/hadoop/tmp/

bin/hadoop namenode -format
scp -r /backup/hadoop lvs: /backup
scp -r /backup/hadoop nq: /backup
[root@web02 hadoop]# bin/start-all.sh
namenode running as process 25305. Stop it first.
nq-data-center: starting datanode, logging to /backup/hadoop/bin/../logs/hadoop-root-datanode-nq-data-center.out
lvs: starting datanode, logging to /backup/hadoop/bin/../logs/hadoop-root-datanode-lvs.out
web02: secondarynamenode running as process 25471. Stop it first.
jobtracker running as process 25547. Stop it first.
nq-data-center: starting tasktracker, logging to /backup/hadoop/bin/../logs/hadoop-root-tasktracker-nq-data-center.out
lvs: starting tasktracker, logging to /backup/hadoop/bin/../logs/hadoop-root-tasktracker-lvs.out
[root@web02 hadoop]# mkdir test-in
[root@web02 hadoop]# cd test-in
[root@web02 test-in]# 在 test-in 目录下创建两个文本文件, WordCount 程序将统计其中各个单词出现次数
-bash: 在: command not found
echo "hello world bye world" >file1.txt
[root@web02 test-in]# echo "hello world bye world" >file1.txt
[root@web02 test-in]# echo "hello hadoop goodbye hadoop" >file2.txt
[root@web02 test-in]# cd ..
[root@web02 hadoop]# bin/hadoop jar hadoop-0.16.0-examples.jar wordcount test-in test-out
执行完毕,下面查看执行结果:
cd test-out
cat part-00000java.io.IOException: Error opening job jar: hadoop-0.16.0-examples.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:114)
at java.util.jar.JarFile.(JarFile.java:133)
at java.util.jar.JarFile.(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
... 4 more
[root@web02 hadoop]# 执行完毕,下面查看执行结果:
-bash: 执行完毕,下面查看执行结果:: command not found
[root@web02 hadoop]# cd test-out
-bash: cd: test-out: 没有那个文件或目录
[root@web02 hadoop]# bin/hadoop jar hadoop-*-examples.jar wordcount test-in test-out
org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : hdfs://192.168.1.212:9000/user/root/test-in
at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:215)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:705)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
[root@web02 hadoop]# bin/hadoop jar hadoop-0.16.0-examples.jar wordcount /backup/hadoop/test-in test-out
java.io.IOException: Error opening job jar: hadoop-0.16.0-examples.jar
at org.apache.hadoop.util.RunJar.main(RunJar.java:90)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
Caused by: java.util.zip.ZipException: error in opening zip file
at java.util.zip.ZipFile.open(Native Method)
at java.util.zip.ZipFile.(ZipFile.java:114)
at java.util.jar.JarFile.(JarFile.java:133)
at java.util.jar.JarFile.(JarFile.java:70)
at org.apache.hadoop.util.RunJar.main(RunJar.java:88)
... 4 more
[root@web02 hadoop]# bin/hadoop jar hadoop-*-examples.jar wordcount /backup/hadoop/test-in test-out
org.apache.hadoop.mapred.InvalidInputException: Input path doesnt exist : hdfs://192.168.1.212:9000/backup/hadoop/test-in
at org.apache.hadoop.mapred.FileInputFormat.validateInput(FileInputFormat.java:215)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:705)
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:973)
at org.apache.hadoop.examples.WordCount.run(WordCount.java:149)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.examples.WordCount.main(WordCount.java:155)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:53)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:194)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:220)
[root@web02 hadoop]# bin/hadoop dfs -put /backup/hadoop/test-in test-out
[root@web02 hadoop]# bin/hadoop jar hadoop-*-examples.jar wordcount test-out output
08/08/08 09:36:03 INFO mapred.FileInputFormat: Total input paths to process : 2
08/08/08 09:36:03 INFO mapred.JobClient: Running job: job_200808080926_0003
08/08/08 09:36:04 INFO mapred.JobClient: map 0% reduce 0%
08/08/08 09:36:09 INFO mapred.JobClient: map 100% reduce 0%
08/08/08 09:36:14 INFO mapred.JobClient: map 100% reduce 22%
08/08/08 09:36:16 INFO mapred.JobClient: map 100% reduce 100%
08/08/08 09:36:17 INFO mapred.JobClient: Job complete: job_200808080926_0003
08/08/08 09:36:17 INFO mapred.JobClient: Counters: 16
08/08/08 09:36:17 INFO mapred.JobClient: File Systems
08/08/08 09:36:17 INFO mapred.JobClient: Local bytes read=226
08/08/08 09:36:17 INFO mapred.JobClient: Local bytes written=710
08/08/08 09:36:17 INFO mapred.JobClient: HDFS bytes read=54
08/08/08 09:36:17 INFO mapred.JobClient: HDFS bytes written=41
08/08/08 09:36:17 INFO mapred.JobClient: Job Counters
08/08/08 09:36:17 INFO mapred.JobClient: Launched map tasks=3
08/08/08 09:36:17 INFO mapred.JobClient: Launched reduce tasks=1
08/08/08 09:36:17 INFO mapred.JobClient: Data-local map tasks=3
08/08/08 09:36:17 INFO mapred.JobClient: Map-Reduce Framework
08/08/08 09:36:17 INFO mapred.JobClient: Map input records=2
08/08/08 09:36:17 INFO mapred.JobClient: Map output records=8
08/08/08 09:36:17 INFO mapred.JobClient: Map input bytes=50
08/08/08 09:36:17 INFO mapred.JobClient: Map output bytes=82
08/08/08 09:36:17 INFO mapred.JobClient: Combine input records=8
08/08/08 09:36:17 INFO mapred.JobClient: Combine output records=6
08/08/08 09:36:17 INFO mapred.JobClient: Reduce input groups=5
08/08/08 09:36:17 INFO mapred.JobClient: Reduce input records=6
08/08/08 09:36:17 INFO mapred.JobClient: Reduce output records=5
[root@web02 hadoop]# bin/hadoop dfs -mkdir testdir
[root@web02 hadoop]# bin/hadoop dfs -rm testdir
rm: Cannot remove directory "/user/root/testdir", use -rmr instead
[root@web02 hadoop]# bin/hadoop dfs -rmr testdir
Deleted /user/root/testdir
[root@web02 hadoop]# bin/hadoop dfs -put /usr/local/heming package
会生成 /user/root/ package 子目录

来自 “ ITPUB博客 ” ,链接:http://blog.itpub.net/9614263/viewspace-1008736/,如需转载,请注明出处,否则将追究法律责任。

转载于:http://blog.itpub.net/9614263/viewspace-1008736/

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值