一.单机模式
a、配置本机到本机的免密登录
b、解压hadoop压缩包,修改hadoop.env.sh中的JAVA_HOME
c、修改core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.110.222</value>
</property>
</configuration>
d、修改hadfs-site.xml
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>/root/softs/hadoop-2.7.3/namelog</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/root/softs/hadoop-2.7.3/datalog</value>
</property>
</configuration>
e、格式化namenode,启动hfds
bin/hdfs namenode -format
sbin/start-dfs.sh
二.高可用配置
a.节点规划
hostname | ip | software | jps |
hbase1 | 192.168.110.51 | jdk/hadoop | namenode/resourcemanager/zkfc |
hbase2 | 192.168.110.52 | jdk/hadoop | namenode/resourcemanager/zkfc |
hbase3 | 192.168.110.53 | jdk/hadoop/zookeeper | datanode/nodemanager/journalnode/quorumpeermain |
hbase4 | 192.168.110.54 | jdk/hadoop/zookeeper | datanode/nodemanager/journalnode/quorumpeermain |
hbase5 | 192.168.110.55 | jdk/hadoop/zookeeper | datanode/nodemanager/journalnode/quorumpeermain |
b.配置免密登录
c.hadoop主要配置文件
core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://myCluster</value> </property> <property> <name>io.file.buffer.size</name> <value>131072</value> </property> <property> <name>fs.trash.interval</name> <value>1</value> </property> <property> <name>ha.zookeeper.quorum</name> <value>hbase3:2181,hbase4:2181,hbase5:2181</value> </property> </configuration>
hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:///root/softs/hadoop-2.7.3/namelogs</value> </property> <property> <name>dfs.blocksize</name> <value>128m</value> </property> <property> <name>dfs.namenode.handler.count</name> <value>100</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:///root/softs/hadoop-2.7.3/data</value> </property> <!-- the max number of files a datanode will serve at any one time --> <property> <name>dfs.datanode.max.transfer.threads</name> <value>4096</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.nameservices</name> <value>myCluster</value> </property> <property> <name>dfs.ha.namenodes.myCluster</name> <value>nn1,nn2</value> </property> <property> <name>dfs.namenode.rpc-address.myCluster.nn1</name> <value>hbase1:8020</value> </property> <property> <name>dfs.namenode.rpc-address.myCluster.nn2</name> <value>hbase2:8020</value> </property> <!-- 配置nn1,nn2的http通信端口 --> <property> <name>dfs.namenode.http-address.myCluster.nn1</name> <value>hbase1:50070</value> </property> <property> <name>dfs.namenode.http-address.myCluster.nn2</name> <value>hbase2:50070</value> </property> <!-- 指定namenode元数据存储在journalnode中的路劲 --> <property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hbase3:8485;hbase4:8485;hbase5:8485/myCluster</value> </property> <!-- 指定journalnode日志文件存储的路劲 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/root/softs/hadoop-2.7.3/journallog</value> </property> <!-- 指定HDFS客户端连接active namenode的java类 --> <property> <name>dfs.client.failover.proxy.provider.myCluster</name> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value> </property> <!-- 配置隔离机制为ssh --> <property> <name>dfs.ha.fencing.methods</name> <value>sshfence</value> </property> <!-- 指定秘钥的位置 --> <property> <name>dfs.ha.fencing.ssh.private-key-files</name> <value>/root/.ssh/id_rsa</value> </property> <!-- 开启自动故障转移 --> <property> <name>dfs.ha.automatic-failover.enabled</name> <value>true</value> </property> </configuration>
mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.map.memory.mb</name> <value>1536</value> </property> <property> <name>mapreduce.map.java.opts</name> <value>-Xmx1024M</value> </property> <property> <name>mapreduce.reduce.memory.mb</name> <value>3072</value> </property> <property> <name>mapreduce.reduce.java.opts</name> <value>-Xmx2560M</value> </property> <property> <name>mapreduce.task.io.sort.mb</name> <value>512</value> </property> <property> <name>mapreduce.task.io.sort.factor</name> <value>100</value> </property> <property> <name>mapreduce.reduce.shuffle.parallelcopies</name> <value>50</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hbase1:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hbase1:19888</value> </property> <!-- 开启uber模式(针对小作业的优化) --> <property> <name>mapreduce.job.ubertask.enable</name> <value>true</value> </property> <!-- 配置启动uber模式的最大map数 --> <property> <name>mapreduce.job.ubertask.maxmaps</name> <value>9</value> </property> <!-- 配置启动uber模式的最大reduce数 --> <property> <name>mapreduce.job.ubertask.maxreduces</name> <value>1</value> </property> </configuration>
yarn-site.xml
<configuration> <!-- Site specific YARN configuration properties --> <property> <name>yarn.resourcemanager.ha.enabled</name> <value>true</value> </property> <!-- 启用自动故障转移 --> <property> <name>yarn.resourcemanager.ha.automatic-failover.enabled</name> <value>true</value> </property> <!-- 指定YARN HA的名称 --> <property> <name>yarn.resourcemanager.cluster-id</name> <value>yarncluster</value> </property> <!-- 指定两个resourcemanager的名称 --> <property> <name>yarn.resourcemanager.ha.rm-ids</name> <value>rm1,rm2</value> </property> <!-- 配置rm1,rm2的主机 --> <property> <name>yarn.resourcemanager.hostname.rm1</name> <value>hbase1</value> </property> <property> <name>yarn.resourcemanager.hostname.rm2</name> <value>hbase2</value> </property> <!-- 配置YARN的http端口 --> <property> <name>yarn.resourcemanager.webapp.address.rm1</name> <value>hbase1:8088</value> </property> <property> <name>yarn.resourcemanager.webapp.address.rm2</name> <value>hbase2:8088</value> </property> <!-- 配置zookeeper的地址 --> <property> <name>yarn.resourcemanager.zk-address</name> <value>hbase3:2181,hbase4:2181,hbase5:2181</value> </property> <!-- 配置zookeeper的存储位置 --> <property> <name>yarn.resourcemanager.zk-state-store.parent-path</name> <value>/rmstore</value> </property> <!-- 开启yarn resourcemanager restart --> <property> <name>yarn.resourcemanager.recovery.enabled</name> <value>true</value> </property> <!-- 配置resourcemanager的状态存储到zookeeper中 --> <property> <name>yarn.resourcemanager.store.class</name> <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value> </property> <!-- 开启yarn nodemanager restart --> <property> <name>yarn.nodemanager.recovery.enabled</name> <value>true</value> </property> <!-- 配置nodemanager IPC的通信端口 --> <property> <name>yarn.nodemanager.address</name> <value>0.0.0.0:45454</value> </property> <!-- 配置Web Application Proxy安全代理(防止yarn被攻击) --> <property> <name>yarn.web-proxy.address</name> <value>hbase2:8888</value> </property> <!-- 开启日志 --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> </configuration>
slaves
hbase3
hbase4
hbase5
d.集群初始化
// 启动zookeeper
// 格式化zkfc
// 启动journalnode
// 格式化hfds
// 将元数据目录拷贝到stangby节点
// 关闭journalnode
e.启动集群
《参考:https://blog.youkuaiyun.com/carl810224/article/details/52160418》