B3-CentOS7安装部署Apache Hadoop 3.1.1

一、虚拟机安装

参考:https://blog.youkuaiyun.com/chirs_chen/article/details/84978941

虚拟机管理软件:Oracle VM VisualBox

虚拟机系统:CentOS-7-x86_64-Minimal-1810.iso

系统安装成功后,先启用 NAT 网卡和 Host-Only 网卡,将 ONBOOT 值改为 yes

配置文件路径
NAT 网卡/etc/sysconfig/network-scripts/ifcfg-enp0s3
Host-Only 网卡/etc/sysconfig/network-scripts/ifcfg-enp0s8

重启 network 服务

systemctl restart network

二、集群规划

组件\主机名apache-hadoop-5apache-hadoop-6apache-hadoop-7
HDFSNameNode、SecondaryNameNodeDataNodeDataNode
YARNResourceManagerNodeManagerNodeManager
JDK1.8.0_1921.8.0_1921.8.0_192

三、下载安装包

apache hadoop 下载地址:http://hadoop.apache.org/releases.html

jdk 下载地址:

https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

四、实用 Shell 脚本(所有主机)

  1. 创建节点 List 文件(不包含自己,以下例子在 apache-hadoop-5 上)

    vi nodes
    
    apache-hadoop-6
    apache-hadoop-7
    
  2. 本地和远程 ssh 免密登录批处理脚本(需要同级目录下的 nodes 文件)

    安装 expect 工具

    yum -y install expect
    

    编写免密登录脚本

    vi freepw.sh
    
    #!/bin/bash
    PASSWORD=<服务器的root密码>
    
    auto_ssh_copy_id() {
        expect -c "set timeout -1;
            spawn ssh-copy-id $1;
            expect {
                *(yes/no)* {send -- yes\r;exp_continue;}
                *assword:* {send -- $2\r;exp_continue;}
                eof        {exit 0;}
            }";
    }
    
    auto_ssh_copy_id localhost $PASSWORD
    cat nodes | while read host
    do
    {
        auto_ssh_copy_id $host $PASSWORD
    }&wait
    done
    
  3. scp 远程传输文件批处理脚本(需要同级目录下的nodes文件)

    vi scp.sh
    
    #!/bin/bash
    cat nodes | while read host
    do
    {
        scp -r $1 $host:$2
    }&wait
    done
    
  4. 本地和远程执行 shell 命令批处理脚本(需要同级目录下的nodes文件)

    vi run.sh
    
    #!/bin/bash
    $1
    cat nodes | while read host
    do
    {
        ssh $host $1
    }&wait
    done
    

五、虚拟机设置

以下操作无特别说明,均在 apache-hadoop-5 上执行

  1. 主机名修改(所有主机)

    hostnamectl set-hostname <hostname>
    
    IP地址主机名
    192.168.56.5apache-hadoop-5
    192.168.56.6apache-hadoop-6
    192.168.56.7apache-hadoop-7
  2. 设置主机名映射

    设置映射主机名,将以下内容添加到 /etc/hosts 文件:

    192.168.56.5 apache-hadoop-5
    192.168.56.6 apache-hadoop-6
    192.168.56.7 apache-hadoop-7
    

    同步 /etc/hosts 文件到其他主机

    sh scp.sh /etc/hosts /etc
    
  3. 设置免密登录(所有主机)

    生成 rsa 公私钥对

    ssh-keygen -t rsa -P ""
    

    同步公钥到本机以及其他主机

    sh freepw.sh
    
  4. 关闭防火墙和 selinux

    关闭防火墙

    sh run.sh "systemctl stop firewalld.service"
    sh run.sh "systemctl disable firewalld.service"
    

    关闭 selinux

    vi /etc/sysconfig/selinux
    
    将 SELINUX=enforcing 改为 SELINUX=disabled
    
    sh scp.sh /etc/sysconfig/selinux /etc/sysconfig
    
  5. 时钟同步

    安装 NTP

    sh run.sh "yum -y install ntp"
    

    添加 NTP 服务器

    vi /etc/ntp.conf
    
    添加以下内容:
    server 0.cn.pool.ntp.org
    server 1.cn.pool.ntp.org
    server 2.cn.pool.ntp.org
    server 3.cn.pool.ntp.org
    
    sh scp.sh /etc/ntp.conf /etc
    

    启动 ntpd 服务

    sh run.sh "systemctl enable ntpd"
    sh run.sh "systemctl start ntpd"
    

    手工同步网络时间

    sh run.sh "ntpdate -u 0.cn.pool.ntp.org"
    

    同步系统时钟

    sh run.sh "hwclock --systohc"
    

五、安装并配置 jdk

拷贝 JDK 到其他主机

sh scp.sh ~/jdk-8u192-linux-x64.tar.gz ~/

解压

sh run.sh "tar -zxf ~/jdk-8u192-linux-x64.tar.gz -C /opt"

配置环境变量

vi /etc/profile

添加以下内容:
#set jdk environment
export JAVA_HOME=/opt/jdk1.8.0_192
export CLASSPATH=.:$JAVA_HOME/lib:$JAVA_HOME/jre/lib:$CLASSPATH
export PATH=$PATH:$JAVA_HOME/bin:$JAVA_HOME/jre/bin
sh scp.sh /etc/profile /etc

在所有主机上执行

source /etc/profile

六、安装 hadoop

预先配置好环境变量,然后先在一台主机上安装,配置好后在拷贝至其他主机

  1. 解压

    tar -zxvf ~/hadoop-3.1.1.tar.gz -C /opt
    
  2. 配置环境变量

    vi /etc/profile
    
    添加以下内容:
    #set hadoop environment
    export HADOOP_HOME=/opt/hadoop-3.1.1
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    sh scp.sh /etc/profile /etc
    

    在所有主机上执行

    source /etc/profile
    

七、配置 hadoop

配置文件路径说明
$HADOOP_HOME/etc/hadoop/hadoop-env.sh配置 hadoop 的环境变量
$HADOOP_HOME/etc/hadoop/yarn-env.sh配置 yarn 的环境变量
$HADOOP_HOME/etc/hadoop/workers配置 DataNode 节点
$HADOOP_HOME/etc/hadoop/core-site.xml配置核心参数
$HADOOP_HOME/etc/hadoop/hdfs-site.xml配置 hsfs 相关参数
$HADOOP_HOME/etc/hadoop/mapred-site.xml配置 mapreduce 相关参数
$HADOOP_HOME/etc/hadoop/yarn-site.xml配置 yarn 相关参数
  1. $HADOOP_HOME/etc/hadoop/hadoop-env.sh

    配置 jdk 和 hadoop 根目录

    export JAVA_HOME=/opt/jdk1.8.0_192
    export HADOOP_HOME=/opt/hadoop-3.1.1
    
  2. $HADOOP_HOME/etc/hadoop/yarn-env.sh

    配置 jdk 根目录

    export JAVA_HOME=/opt/jdk1.8.0_192
    
  3. $HADOOP_HOME/etc/hadoop/workers

    配置 DataNode 节点 hostname

    apache-hadoop-6
    apache-hadoop-7
    
  4. $HADOOP_HOME/etc/hadoop/core-site.xml

    创建 hadoop 临时目录

    mkdir -p $HADOOP_HOME/tmp
    

    配置

    <configuration>
            <property>
                    <!-- hdfs nameservice -->
                    <name>fs.defaultFS</name>
                    <value>hdfs://apache-hadoop-5:8020</value>
            </property>
            <property>
                    <name>hadoop.tmp.dir</name>
                    <value>/opt/hadoop-3.1.1/tmp</value>
            </property>
    </configuration>
    
  5. $HADOOP_HOME/etc/hadoop/hdfs-site.xml

    创建

    mkdir -p $HADOOP_HOME/hdfs/name $HADOOP_HOME/hdfs/data
    

    配置

    <configuration>
            <property>
                    <name>dfs.namenode.http-address</name>
                    <value>apache-hadoop-5:50070</value>
            </property>
            <property>
                    <name>dfs.namenode.secondary.http-address</name>
                    <value>apache-hadoop-5:50090</value>
            </property>
            <property>
                    <name>dfs.namenode.name.dir</name>
                    <value>/opt/hadoop-3.1.1/hdfs/name</value>
            </property>
            <property>
                    <name>dfs.datanode.data.dir</name>
                    <value>/opt/hadoop-3.1.1/hdfs/data</value>
            </property>
            <property>
                    <!-- 文件副本数 -->
                    <name>dfs.replication</name>
                    <value>2</value>
            </property>
    </configuration>
    
  6. $HADOOP_HOME/etc/hadoop/mapred-site.xml

    <configuration>
    		<property>
    				<name>mapreduce.framework.name</name>
    				<value>yarn</value>
    		</property>
    		<property>
    				<name>mapreduce.jobhistory.address</name>
    				<value>apache-hadoop-5:10020</value>
    		</property>
    		<property>
    				<name>mapreduce.jobhistory.webapp.address</name>
    				<value>apache-hadoop-5:19888</value>
    		</property>
    </configuration>
    
  7. $HADOOP_HOME/etc/hadoop/yarn-site.xml

    <configuration>
    <!-- Site specific YARN configuration properties -->
            <property>
                    <name>yarn.resourcemanager.hostname</name>
                    <value>apache-hadoop-5</value>
            </property>
            <property>
                    <name>yarn.nodemanager.aux-services</name>
                    <value>mapreduce_shuffle</value>
            </property>
            <property>
                    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                    <value>org.apache.hadoop.mapred.ShuffleHandler</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.address</name>
                    <value>${yarn.resourcemanager.hostname}:8032</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.scheduler.address</name>
                    <value>${yarn.resourcemanager.hostname}:8030</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.resource-tracker.address</name>
                    <value>${yarn.resourcemanager.hostname}:8035</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.admin.address</name>
                    <value>${yarn.resourcemanager.hostname}:8033</value>
            </property>
            <property>
                    <name>yarn.resourcemanager.webapp.address</name>
                    <value>${yarn.resourcemanager.hostname}:8088</value>
            </property>
    </configuration>
    

八、增加 hadoop 用户定义

若没有增加用户定义,启动 hadoop 会报以下错误:

Starting namenodes on [apache-hadoop-5]
#!/usr/bin/env bash
ERROR: Attempting to operate on hdfs namenode as root
#!/usr/bin/env bash
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation.
Starting datanodes
#!/usr/bin/env bash
ERROR: Attempting to operate on hdfs datanode as root
#!/usr/bin/env bash
ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation.
#!/usr/bin/env bash
Starting secondary namenodes [apache-hadoop-5]
ERROR: Attempting to operate on hdfs secondarynamenode as root
#!/usr/bin/env bash
ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.
#!/usr/bin/env bash
Starting resourcemanager
ERROR: Attempting to operate on yarn resourcemanager as root
ERROR: but there is no YARN_RESOURCEMANAGER_USER defined. Aborting operation.
Starting nodemanagers
ERROR: Attempting to operate on yarn nodemanager as root
ERROR: but there is no YARN_NODEMANAGER_USER defined. Aborting operation.
  1. 编辑并保存 $HADOOP_HOME/sbin/start-dfs.sh 和 $HADOOP_HOME/sbin/stop-dfs.sh 脚本,在文件顶部增加以下内容:

    #!/usr/bin/env bash
    
    # Add user defined
    HDFS_DATANODE_USER=root
    HADOOP_SECURE_DN_USER=hdfs
    HDFS_NAMENODE_USER=root
    HDFS_SECONDARYNAMENODE_USER=root
    
  2. 编辑并保存 $HADOOP_HOME/sbin/start-yarn.sh 和 $HADOOP_HOME/sbin/stop-yarn.sh 脚本,在文件顶部增加以下内容:

    #!/usr/bin/env bash
    
    # Add user defined
    YARN_RESOURCEMANAGER_USER=root
    HADOOP_SECURE_DN_USER=yarn
    YARN_NODEMANAGER_USER=root
    

九、拷贝 hadoop 至其他主机

sh scp.sh /opt/hadoop-3.1.1 /opt/hadoop-3.1.1

十、启动和关闭集群

  1. 格式化 hdfs

    hdfs namenode -format
    
  2. 启动 hadoop

    start-all.sh
    
  3. 查看启动节点

    可以通过查看节点来判断是否启动成功

    jps
    
  4. 关闭 hadoop

    stop-all.sh
    

十一、hdfs 操作

查看 hdfs 目录:hadoop fs -ls /

新建 hdfs 目录:hadoop fs -mkdir -p /hdfs

上传文件到 hdfs:hadoop fs -put freewp.sh /hdfs

重命名 hdfs 目录:hadoop fs -mv /hdfs /dfs

查看 hdfs 文件内容:hadoop fs -cat /dfs/freewp.sh

删除 hdfs 目录:hadoop fs -rm -r /dfs

十二、执行 wordcount 任务

新建 hdfs 目录:hadoop fs -mkdir -p /hdfs

上传文件到 hdfs :hadoop fs -put freewp.sh /hdfs

freepw.sh 执行 wordcount 任务,命令如下:

yarn jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.1.jar wordcount /hdfs/freepw.sh /hdfs/temp-wordcount-out

执行过程中出现两个错误,查看问题解决后的执行结果:

hadoop fs -cat /hdfs/temp-wordcount-out/part-r-00000

结果:

"set    1
#!/bin/bash     1
$1;     1
$2\r;exp_continue;}     1
$PASSWORD       2
$host   1
*(yes/no)*      1
*assword:*      1
--      2
-1;     1
-c      1
0;}     1
PASSWORD=chenpanyu      1
auto_ssh_copy_id        2
auto_ssh_copy_id()      1
cat     1
do      1
done    1
eof     1
expect  2
host    1
localhost       1
nodes   1
read    1
spawn   1
ssh-copy-id     1
timeout 1
while   1
yes\r;exp_continue;}    1
{       3
{exit   1
{send   2
|       1
}       1
}";     1
}&wait  1

十三、问题及解决方案

  1. 问题一

    • 错误内容:

      [2018-12-28 15:45:42.628]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
      Last 4096 bytes of prelaunch.err :
      Last 4096 bytes of stderr :
      Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
      
      Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
      <property>
        <name>yarn.app.mapreduce.am.env</name>
        <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
      </property>
      <property>
        <name>mapreduce.map.env</name>
        <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
      </property>
      <property>
        <name>mapreduce.reduce.env</name>
        <value>HADOOP_MAPRED_HOME=${full path of your hadoop distribution directory}</value>
      </property>
      
    • 分析:

      hadoop 3.0.0 版本之后,各个 service 的环境变量已经不互相继承,必须通过 configuration 设定才行。

    • 解决方案:

      给 $HADOOP_HOME/etc/hadoop/mapred-site.xml 增加配置,并同步所有主机,如下:

      <property>
              <name>hadoop.mapreduce.home</name>
              <value>/opt/hadoop-3.1.1</value>
      </property>
      <property>
              <name>yarn.app.mapreduce.am.env</name>
              <value>HADOOP_MAPRED_HOME=${hadoop.mapreduce.home}</value>
      </property>
      <property>
              <name>mapreduce.map.env</name>
              <value>HADOOP_MAPRED_HOME=${hadoop.mapreduce.home}</value>
      </property>
      <property>
              <name>mapreduce.reduce.env</name>
              <value>HADOOP_MAPRED_HOME=${hadoop.mapreduce.home}</value>
      </property>
      
  2. 问题二

    • 错误内容:

      [2018-12-28 16:48:24.606]Container [pid=15778,containerID=container_1545983029903_0003_01_000002] is running 463743488B beyond the 'VIRTUAL' memory limit. Current usage: 85.1 MB of 1 GB physical memory used; 2.5 GB of 2.1 GB virtual memory used. Killing container.
      Dump of the process-tree for container_1545983029903_0003_01_000002 :
              |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
              |- 15787 15778 15778 15778 (java) 815 22 2602704896 21470 /opt/jdk1.8.0_192/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx820m -Djava.io.tmpdir=/opt/hadoop-3.1.1/tmp/nm-local-dir/usercache/root/appcache/application_1545983029903_0003/container_1545983029903_0003_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/hadoop-3.1.1/logs/userlogs/application_1545983029903_0003/container_1545983029903_0003_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.56.6 40118 attempt_1545983029903_0003_m_000000_0 2 
              |- 15778 15775 15778 15778 (bash) 1 0 115896320 306 /bin/bash -c /opt/jdk1.8.0_192/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN   -Xmx820m -Djava.io.tmpdir=/opt/hadoop-3.1.1/tmp/nm-local-dir/usercache/root/appcache/application_1545983029903_0003/container_1545983029903_0003_01_000002/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/hadoop-3.1.1/logs/userlogs/application_1545983029903_0003/container_1545983029903_0003_01_000002 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.56.6 40118 attempt_1545983029903_0003_m_000000_0 2 1>/opt/hadoop-3.1.1/logs/userlogs/application_1545983029903_0003/container_1545983029903_0003_01_000002/stdout 2>/opt/hadoop-3.1.1/logs/userlogs/application_1545983029903_0003/container_1545983029903_0003_01_000002/stderr  
      
      [2018-12-28 16:48:24.871]Container killed on request. Exit code is 143
      [2018-12-28 16:48:24.900]Container exited with a non-zero exit code 143. 
      
    • 解决方案:

      给 $HADOOP_HOME/etc/hadoop/yarn-site.xml 增加配置,并同步所有主机,如下:

      <property>
              <name>yarn.nodemanager.vmem-check-enabled</name>
              <value>false</value>
              <description>Whether virtual memory limits will be enforced for containers</description>
      </property>
      <property>
              <name>yarn.nodemanager.vmem-pmem-ratio</name>
              <value>4</value>
              <description>Ratio between virtual memory to physical memory when setting memory limits for containers</description>
      </property>
      

参考资料

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值