搭建hadoop-2.5.0.tar.gz,下载地址:http://pan.baidu.com/s/1pKWe1L5
- 集群规划:三台服务器:hadoop-senior.orange.com、hadoop-senior.banana.com和hadoop-senior.pear.com
主机名 | ||||
banana | resource | datanode | nodemanager | |
orange | namenode | datanode | nodemanager | |
pear | secondarynamenode | datanode | nodemanager | historyserver |
- 改变时区,orange下操作:
# mv /etc/localtime /etc/localtime_bak # ln -s /usr/share/zoneinfo/Asia/Shanghai /etc/localtime
- 同步时间并保存至硬件:
# ntpdate asia.pool.ntp.org # hwclock -w
- 修改ntp.conf
# vi /etc/ntp.conf //删除前面的#,把网段改成自己的网段 restrict 192.168.181.0 mask 255.255.255.0 nomodify notrap //下面三行添加# #server 0.centos.pool.ntp.org #server 1.centos.pool.ntp.org #server 2.centos.pool.ntp.org //删除前面的# server 127.127.1.0 # local clock fudge 127.127.1.0 stratum 10
# service ntpd restart
# chkconfig ntpd on
- 设置slave的时间同步,banana和pear下操作:
# crontab -e
*/10 * * * * /usr/sbin/ntpdate hadoop-senior.orange.com
# service crond restart
- 配置SSH无密钥登录,现在orange上操作:
# ssh-keygen -t rsa
此操作全部回车
# ssh-copy-id beifeng@hadoop-senior.banana.com
第一次输入yes
第二次输入密码
第二次输入密码
同理执行:
# ssh-copy-id beifeng@hadoop-senior.pear.com
# ssh-copy-id beifeng@hadoop-senior.orange.com
按照上述方式在banana和pear上操作。
- 配置Hadoop分布式环境:
[root@hadoop-senior hadoop]# grep -n "/usr/java/jdk1.7.0_79" *
hadoop-env.sh:25:export JAVA_HOME=/usr/java/jdk1.7.0_79
mapred-env.sh:16:export JAVA_HOME=/usr/java/jdk1.7.0_79
yarn-env.sh:23:export JAVA_HOME=/usr/java/jdk1.7.0_79
修改.xml配置文件(参考集群规划),如下四个文件:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-senior.orange.com:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/modules/hadoop-2.5.0/data</value>
</property>
</configuration>
hdfs-site.xml:
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.http-address</name>
<value>hadoop-senior.orange.com:50070</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hadoop-senior.pear.com:50090</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>true</value>
</property>
</configuration>
yarn-site.xml:
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hadoop-senior.banana.com</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
</configuration>
mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hadoop-senior.pear.com:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hadoop-senior.pear.com:19888</value>
</property>
</configuration>
在slaves文件中添加:
hadoop-senior.orange.com
hadoop-senior.banana.com
hadoop-senior.pear.com
- 将hadoop安装目录拷贝到banana、pear:
# scp -r hadoop-2.5.0/ beifeng@hadoop-senior.banana.com:/opt/modules/
# scp -r hadoop-2.5.0/ beifeng@hadoop-senior.pear.com:/opt/modules/
格式化:
# /opt/modules/hadoop-2.5.0/bin/hdfs namenode -format
根据集群规划启动响应的服务:
在orange下执行:
# /opt/modules/hadoop-2.5.0/bin/start-dfs.sh
在banana下执行:
# /opt/modules/hadoop-2.5.0/bin/start-yarn.sh
在pear下执行:
# /opt/modules/hadoop-2.5.0/bin/mr-jobhistory-daemon.sh start historyserver
- 测试,上传文件,执行wordcount:
# /opt/modules/hadoop-2.5.0/bin/hdfs dfs -put sort.txt /input/
# /opt/modules/hadoop-2.5.0/bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/sort.txt /output
# /opt/modules/hadoop-2.5.0/bin/hdfs dfs -text /output/part*
[beifeng@hadoop-senior hadoop-2.5.0]$ ./bin/hdfs dfs -text /output/part*
abcs 1
ddfs 1
hadoop 3
word 1
- 评测HDFS:
写评测:
读评测:
评测mapreduce
1、生产若干随机数据
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0-tests.jar TestDFSIO -write -nrFile 10 -fileSize 1000
写10个,每个1000MB的文件,文件默认被写到hdfs的/benchmarks/TestDFSIO读评测:
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0-tests.jar TestDFSIO -read -nrFiles 10 -fileSize 1000
清理测试数据:$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.5.0-tests.jar TestDFSIO -clean
评测mapreduce
1、生产若干随机数据
$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar teragen 100 /input/
2、运行terasort对数据库进行排序$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar terasort /input /output
3、运行tera 验证排序过的teragen$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar teravalidate /output /output2