一、Hadoop是什么?
- Hadoop是分布式系统基础架构;
- 主要包括HDFS(分布式文件系统)、YARN(资源调度系统)、MapReduce(分布式计算框架)三部分构成。
二、Hadoop能干什么?
- 使用户可以在不了解分布式底层细节的情况下,开发分布式程序;
- 充分利用集群的威力,进行大规模数据的高速运算和存储。
三、Hadoop HA(zk、ssh已配置好)
3.1机器规划
master | NameNode | DataNode | Zookeeper | JournalNode |
slave1 | NameNode | DataNode | Zookeeper | JournalNode |
slave2 | DataNode | Zookeeper | JournalNode |
3.2配置
1. core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:///opt/app/hadoop-2.6.0-cdh5.7.0/tmp</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>master:2181,slave1:2181,slave2:2181</value>
</property>
</configuration>
2. hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///opt/app/hadoop-2.6.0-cdh5.7.0/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///opt/app/hadoop-2.6.0-cdh5.7.0/data</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<property>
<name>dfs.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- <property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>-->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>master:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>slave1:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>master:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>slave1:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://master:8485;slave1:8485;slave2:8485/mycluster</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/app/hadoop-2.6.0-cdh5.7.0/journalnode</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
<property>
<name>dfs.permissions.enable</name>
<value>false</value>
</property>
</configuration>
3. mapreduce-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
4. yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>yrc</value>
</property>
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>master</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>slave1</value>
</property>
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>master:2181,slave1:2181,slave2:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
3.3 启动
1. 开启zookeeper集群(每台机器)
zkServer.sh start /opt/app/zookeeper-3.4.5-cdh5.7.0/conf/zoo_cluster.cfg
2. 开启journalnode(每台机器)
hadoop-daemon.sh start journalnode
3. 格式化hdfs
hdfs namenode -format
# 错误:
java.io.IOException: Invalid configuration: a shared edits dir must not be specified if HA is not enabled.
# 原因:
照着网上的博客搭建的,hdfs设置namenode集群
<property>
<name>dfs.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
# 格式化后一直报上面这个错误,后查阅文档,发现在Hadoop2.6.0中配置Namenode集群使用这个属性。
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
```
4. 重新格式化,成功
log:20/02/23 12:24:17 INFO common.Storage: Storage directory /opt/app/hadoop-2.6.0-cdh5.7.0/name has been successfully formatted.
5. 拷贝master上的元数据信息到slave1上
scp -r /opt/app/hadoop-2.6.0-cdh5.7.0/name/* slave1:/opt/app/hadoop-2.6.0-cdh5.7.0/name/
6. 在master节点格式化zkfc,一定要执行这步,不然后面namenode都处于standy模式
hdfs zkfc -formatZK
log:20/02/23 13:13:29 INFO ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.
然后启动dfs:hdfs start-dfs.sh
执行hadoop命令:hadoop fs -ls /
# 错误
Operation category READ is not supported in state standby. Visit https://s.a
# 原因:Namenode处于standby模式
# 解决
# 停止dfs stop-dfs.sh
# 格式化ZK hdfs zkfc -formatZK
7.启动dfs: start-dfs.sh
启动起来了,但是发现DataNode没有启动起来。
查看日志:
namenode下的clusterId和datanode的clusterId不一致
修改data/current/VERSION clusterID= name/current/VERSION clusterID
8.启动dfs,ok!!!
3.4 HA测试
3.4.1 Namenode HA测试
1. 当前状态:
- hdfs haadmin -getServiceState nn1
- nn1:active
- hdfs haadmin -getServiceState nn2
- nn2:standby
2.模拟active namenode挂掉
- kill master namenode
3.查看slave1 namenode
发现还是处于standby模式,hdfs集群不可用
也就是不能自动切换active,查阅博客,查看zkfc的log日志,提示fuser:command not found。
4.安装插件:yum install psmisc
5.启动dfs,再次测试,成功
6. 只启动namenode或datanode的命令,生产中不允许总是格式化或全部重启,一般都是针对点解决
hadoop-daemon.sh start|stop namenode|datanode| journalnode
yarn-daemon.sh start |stop resourcemanager|nodemanager
3.4.2 YARN HA测试
1. slave1的上resourcemanager需要手动启动
- yarn-daemon.sh start resourcemanager
2. 获取yarn的状态
- yarn rmadmin -getServiceState rm1
- rm1:active
- yarn rmadmin -getServiceState rm2
- rm2:standby
3.模拟master gg
- kill rm1
- yarn rmadmin -getServiceState rm2
- rm2:active
3.5 hadoop HA场景下 java客户端如何远程访问HDFS?
方法1:将所有关于namenode的参数写入Configuration对象中。(建议第一种)
方法2:将hadoop集群配置文件core-site.xml和hdfs-site.xml文件拷贝到项目的src目录下。
转载连接:https://blog.youkuaiyun.com/wo198711203217/article/details/80528860
https://blog.youkuaiyun.com/twj0823/article/details/84346176