在Linux上搭建Hadoop集群
下载软件:
(1)Jdk
(2)Hadoop包
步骤:
1.修改/etc/hosts
2.配置JDK 安装
解压:tar -zxvf jdk-8u172-ea-bin-b03-linux-x64-18_jan_2018.tar.gz
cd /opt/ 进入opt
Vi /etc/profile 配置etc/profile,修改JAVA_HOME
echo $JAVA_HOME 查看配置JAVA_HOME,为空,修改无效
source /etc/profile echo $JAVA_HOME 重新查看,/usr/java/jdk1.8.0_161,修改生效
检查jdk安装好,java -version
3.配置SSH(免密码登录)
ssh-Keygen -t rsa ll .ssh/ 私钥和公钥 移动公钥 赋予权限
tar zxf hadoop-2.7.4.tar.gz 解压hadoop压缩包 查看ll
cd hadoop-2.7.4 ll
pwd //查看路径/home/hadoop/opt/hadoop-2.7.4
vi /etc/profil 配置$HADOOP_HOME=/home/hadoop/opt/hadoop-2.7.4
source /etc/profile
echo $HADOOP_HOME
cd /home/hadoop/opt/hadoop-2.7.4/etc/hadoop/
配置环境变量:
Vi core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://bigdata:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/opt/hadoop-2.7.4/current/tmp</value>
</property>
<property>
<name>fs.trash.interval</name>
<value>4328</value>
</property>
</configuration>
Vi hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.di</name>
<value>/home/hadoop/opt/hadoop-2.7.4/current/dfs/name</value>
</property>
<property>
<name>dfs,datanode.data.dir</name>
<value>/home/hadoop/opt/hadoop-2.7.4/current/data</value>
</property>
<property>
<name>dfs.replication</name>
<value></value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.permissions.superusergroup</name>
<value>staff</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
Vi yarn-site.xml
<property>
<name>yarn.resourcemanager.hostname</name>
<value>bigdata</value>
</property>
<property>
<name>yarn.nodemanager.aux.services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux.services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>bigdata:18040</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>bigdata:18030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>bigdata:18025</value>
</property>
<property>
<name>yarn.resource.manager.admin.address</name>
<value>bigdata:18141</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>bigdata:18088</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
<property>
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>86400</value>
</property>
<property>
<name>yarn.log-aggregation.retain.check.interval-seconds</name>
<value>86400</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir-suffix</name>
<value>logs</value>
</property>
Vi mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobtracker.http.address</name>
<value>bigdata:50030</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>bigdata:10020</value>
</property>
<property>
<name>mapreduce:jobhistory.webapp.address</name>
<value>bigdata:19888</value>
</property>
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/jobhistory/done</value>
</property>
<property>
<name>mapreduce.intermediate-done-dir</name>
<value>/jobhistory/done_intermediate</value>
</property>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
Vi slaves
Bigdata
vi hadoop-env.sh 修改$JAVA_HOME
4.格式化hdfs
hdfs namenode -format
注:hdfs namenode格式化时出现错误/home/hadoop/opt/hadoop-2.7.4/bin/hdfs:行304: /usr/bin/java/bin/java: 不是目录 /home/hadoop/opt/hadoop-2.7.4/bin/hdfs: 第 304 行:exec: /usr/bin/java/bin/java: 无法执行: 不是目录
Jdk安装路径出现问题,which java的路径并不能配置到/etc/profile中去,一下才是实际路径
5.启动hadoop集群
/home/hadoop/opt/hadoop-2.7.4/sbin/start-all.sh
6.验证Hadoop集群
(1)Jps
(2)关闭防火墙或者在防火墙规则中开放这些端口
hdfs http://bigdata:50070/
yarn http://bigdata:18088/