环境准备:
装备使用一台主 namenode,一台备namenode,2台datanode。 系统: Red Hat Enterprise Linux Server release 6.6 (Santiago)
配置/etc/hosts
192.168.83.11 hd1 192.168.83.22 hd2 192.168.83.33 hd3 192.168.83.44 hd4
修改主机名:
vi /etc/sysconfig/network
NETWORKING=yes HOSTNAME=hd1
创建hadoop运行用户
groupadd -g 10010 hadoop
useradd -u 1001 -g 10010 -d /home/hadoop hadoop
ssh对等性验证包括:
主要包括hd1 到各个节点的对等性验证设置。 在各个节点生成authorizedkeys ,然后全部拷贝到hd1合并,然后把hd1合并后的authorizedkeys 分发到各个子节点。
hd1: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys hd2: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys scp ~/.ssh/authorizedkeys hd1:~/.ssh/authorizedkeys2 hd3: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys scp ~/.ssh/authorizedkeys hd1:~/.ssh/authorizedkeys3 hd4: ssh-keygen -t rsa cat ~/.ssh/.pub >~/.ssh/authorizedkeys scp ~/.ssh/authorizedkeys hd1:~/.ssh/authorizedkeys4
配置hadoop用户环境变量:
vi ~/.bash_profile
export JAVA_HOME=/usr/java/jdk1.8.0_11
export JRE_HOME=/usr/java/jdk1.8.0_11/jre
export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
export HADOOP_INSTALL=/usr/hadoop/hadoop-2.7.1
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$HADOOP_INSTALL/bin:$HADOOP_INSTALL/sbin
安装jdk
[hadoop@hd1 ~]$ java -version
java version "1.8.0_11"
Java(TM) SE Runtime Environment (build 1.8.0_11-b12)
Java HotSpot(TM) 64-Bit Server VM (build 25.11-b03, mixed mode)
配置hadoop的基础环境变量,如JDK位置,hadoop一些配置、文件及日志的路径,这些配置都在/home/hadoop-2.7.1/etc/hadoop/hadoop-env.sh文件中,修改以下内容: export JAVAHOME=/usr/java/jdk1.8.011
安装hadoop
从官网http://hadoop.apache.org下载hadoop-2.7.1.tar.gz,只在Master上解压,我的解压路径是 /usr/hadoop/hadoop-2.7.1/
配置hadoop:
由于hadoop运行有3种模式,独立模式,伪分布模式,集群模式,接下就这三种模式分别配置。 我们可以在启动的时候通过--config 参数来指定启动路径,从而达到多种模式共存的目的。我们一般会在环境变量配置hadoop_install参数指定hadoop根目录路径。 独立模式:单节点运行,hadoop默认配置都是独立模式,可以直接启动。 伪分布模式(pseudo):在一台主机模拟集群模式。 hadoop分为core,hdfs和map/reduce三部分。配置文件也被分成了三个core- site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml。
下面来配置伪分布模式:
修改Hadoop核心配置文件core-site.xml,这里配置的是HDFS master(即namenode)的地址和端口号。
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost/</value>
</property>
</configuration>
配置hdfs-site.xml文件
修改Hadoop中HDFS的配置,配置的备份方式默认为3。
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
配置mapred-site.xml文件
修改Hadoop中MapReduce的配置文件,配置的是JobTracker的地址和端口。
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
配置yarn-site.xml文件
<property>
<name>yarn.resourcemanager.hostname</name>
<value>localhost</value>
</property>
<property>
<name>yarn.resourcemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
hdfs文件格式化:
hadoop namenode -format
STARTUP_MSG: java = 1.8.0_11
************************************************************/
18/07/23 17:04:32 INFO namenode.NameNode: registered UNIX signal handlers for [TERM, HUP, INT]
18/07/23 17:04:32 INFO namenode.NameNode: createNameNode [-format]
18/07/23 17:04:33 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Formatting using clusterid: CID-7ebdb3d2-19c1-4c1a-a64f-c3c149d1c07f
18/07/23 17:04:34 INFO namenode.FSNamesystem: No KeyProvider found.
18/07/23 17:04:34 INFO namenode.FSNamesystem: fsLock is fair:true
18/07/23 17:04:34 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000
18/07/23 17:04:34 INFO blockmanagement.DatanodeManager: dfs.namenode.datanode.registration.ip-hostname-check=true
18/07/23 17:04:34 INFO blockmanagement.BlockManager: dfs.namenode.startup.delay.block.deletion.sec is set to 000:00:00:00.000
18/07/23 17:04:34 INFO blockmanagement.BlockManager: The block deletion will start around 2018 Jul 23 17:04:34
18/07/23 17:04:34 INFO util.GSet: Computing capacity for map BlocksMap
18/07/23 17:04:34 INFO util.GSet: VM type = 64-bit
18/07/23 17:04:34 INFO util.GSet: 2.0% max memory 966.7 MB = 19.3 MB
18/07/23 17:04:34 INFO util.GSet: capacity = 2^21 = 2097152 entries
18/07/23 17:04:34 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false
18/07/23 17:04:34 INFO blockmanagement.BlockManager: defaultReplication = 3
18/07/23 17:04:34 INFO blockmanagement.BlockManager: maxReplication = 512
18/07/23 17:04:34 INFO blockmanagement.BlockManager: minReplication = 1
18/07/23 17:04:34 INFO blockmanagement.BlockManager: maxReplicationStreams = 2
18/07/23 17:04:34 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks = false
18/07/23 17:04:34 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000
18/07/23 17:04:34 INFO blockmanagement.BlockManager: encryptDataTransfer = false
18/07/23 17:04:34 INFO blockmanagement.BlockManager: maxNumBlocksToLog = 1000
18/07/23 17:04:34 INFO namenode.FSNamesystem: fsOwner = hadoop (auth:SIMPLE)
18/07/23 17:04:34 INFO namenode.FSNamesystem: supergroup = supergroup
18/07/23 17:04:34 INFO namenode.FSNamesystem: isPermissionEnabled = true
18/07/23 17:04:34 INFO namenode.FSNamesystem: HA Enabled: false
18/07/23 17:04:34 INFO namenode.FSNamesystem: Append Enabled: true
18/07/23 17:04:35 INFO util.GSet: Computing capacity for map INodeMap
18/07/23 17:04:35 INFO util.GSet: VM type = 64-bit
18/07/23 17:04:35 INFO util.GSet: 1.0% max memory 966.7 MB = 9.7 MB
18/07/23 17:04:35 INFO util.GSet: capacity = 2^20 = 1048576 entries
18/07/23 17:04:35 INFO namenode.FSDirectory: ACLs enabled? false
18/07/23 17:04:35 INFO namenode.FSDirectory: XAttrs enabled? true
18/07/23 17:04:35 INFO namenode.FSDirectory: Maximum size of an xattr: 16384
18/07/23 17:04:35 INFO namenode.NameNode: Caching file names occuring more than 10 times
18/07/23 17:04:35 INFO util.GSet: Computing capacity for map cachedBlocks
18/07/23 17:04:35 INFO util.GSet: VM type = 64-bit
18/07/23 17:04:35 INFO util.GSet: 0.25% max memory 966.7 MB = 2.4 MB
18/07/23 17:04:35 INFO util.GSet: capacity = 2^18 = 262144 entries
18/07/23 17:04:35 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033
18/07/23 17:04:35 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0
18/07/23 17:04:35 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension = 30000
18/07/23 17:04:35 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.window.num.buckets = 10
18/07/23 17:04:35 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.num.users = 10
18/07/23 17:04:35 INFO metrics.TopMetrics: NNTop conf: dfs.namenode.top.windows.minutes = 1,5,25
18/07/23 17:04:35 INFO namenode.FSNamesystem: Retry cache on namenode is enabled
18/07/23 17:04:35 INFO namenode.FSNamesystem: Retry cache will use 0.03 of total heap and retry cache entry expiry time is 600000 millis
18/07/23 17:04:35 INFO util.GSet: Computing capacity for map NameNodeRetryCache
18/07/23 17:04:35 INFO util.GSet: VM type = 64-bit
18/07/23 17:04:35 INFO util.GSet: 0.029999999329447746% max memory 966.7 MB = 297.0 KB
18/07/23 17:04:35 INFO util.GSet: capacity = 2^15 = 32768 entries
Re-format filesystem in Storage Directory /tmp/hadoop-hadoop/dfs/name ? (Y or N) Y
18/07/23 17:07:57 INFO namenode.FSImage: Allocated new BlockPoolId: BP-1239596151-192.168.83.11-1532336877181
18/07/23 17:07:57 INFO common.Storage: Storage directory /tmp/hadoop-hadoop/dfs/name has been successfully formatted.
18/07/23 17:07:57 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
18/07/23 17:07:57 INFO util.ExitUtil: Exiting with status 0
18/07/23 17:07:57 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at hd1/192.168.83.11
************************************************************/
启动dfs:
[hadoop@hd1 hadoop_pseudo]$ start-dfs.sh --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_pseudo/
Starting namenodes on [localhost]
localhost: starting namenode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-namenode-hd1.out
localhost: starting datanode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-datanode-hd1.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/hadoop/hadoop-2.7.1/logs/hadoop-hadoop-secondarynamenode-hd1.out
启动yarn:
[hadoop@hd1 hadoop_pseudo]$ start-yarn.sh --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_pseudo/
starting yarn daemons
starting resourcemanager, logging to /usr/hadoop/hadoop-2.7.1/logs/yarn-hadoop-resourcemanager-hd1.out
localhost: starting nodemanager, logging to /usr/hadoop/hadoop-2.7.1/logs/yarn-hadoop-nodemanager-hd1.out
启动Mapreduce:mr-jobhistory-daemon.sh start historyserver
或者 如下命令启动hadoop伪分布所有组件:
[hadoop@hd1 ~]$ start-all.sh --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_pseudo
当然也可以使用HADOOPCONFDIR来制定hadoop配置文件路径。如下:
export HADOOP_CONF_DIR=$HADOOP_INSTALL/etc/hadoop_pseudo
[hadoop@hd1 ~]$ hadoop fs -ls /
18/07/24 07:29:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 24 items
-rw-r--r-- 1 root root 0 2018-07-23 16:37 /.autofsck
dr-xr-xr-x - root root 4096 2018-07-19 20:35 /bin
dr-xr-xr-x - root root 1024 2018-07-19 19:06 /boot
drwxr-xr-x - root root 4096 2014-08-07 13:29 /cgroup
drwxr-xr-x - root root 3800 2018-07-23 16:37 /dev
drwxr-xr-x - root root 12288 2018-07-23 16:37 /etc
drwxr-xr-x - root root 4096 2018-07-20 16:37 /home
dr-xr-xr-x - root root 4096 2018-07-19 19:05 /lib
dr-xr-xr-x - root root 12288 2018-07-19 20:35 /lib64
drwx------ - root root 16384 2018-07-19 19:00 /lost+found
drwxr-xr-x - root root 4096 2011-06-28 22:13 /media
drwxr-xr-x - root root 0 2018-07-23 16:37 /misc
drwxr-xr-x - root root 4096 2011-06-28 22:13 /mnt
drwxr-xr-x - root root 0 2018-07-23 16:37 /net
drwxr-xr-x - root root 4096 2018-07-19 19:05 /opt
dr-xr-xr-x - root root 0 2018-07-23 16:37 /proc
dr-xr-x--- - root root 4096 2018-07-19 22:53 /root
dr-xr-xr-x - root root 12288 2018-07-19 20:35 /sbin
drwxr-xr-x - root root 0 2018-07-23 16:37 /selinux
drwxr-xr-x - root root 4096 2011-06-28 22:13 /srv
drwxr-xr-x - root root 0 2018-07-23 16:37 /sys
drwxrwxrwt - root root 4096 2018-07-24 07:27 /tmp
drwxr-xr-x - root root 4096 2018-07-20 18:29 /usr
drwxr-xr-x - root root 4096 2018-07-19 19:05 /var
后记: 如果是在win平台安装hadoop 2.7.1,需要另外几个文件(链接:https://pan.baidu.com/s/1w1-cmTDTLWC_sFNWpxrOQA 密码:ozzw),不然启动报错。 把如下几个文件拷贝到hadoop安装目录下bin目录即可。
集群模式部署:
cp -p -r $HADOOP_INSTALL/etc/hadoop $HADOOP_INSTALL/etc/hadoop_cluster
ln -s /usr/hadoop/hadoop-2.7.1/etc/hadoop_cluster /usr/hadoop/hadoop-2.7.1/etc/hadoop
需要修改 slaves、core-site.xml、hdfs-site.xml、mapred-site.xml、yarn-site.xml 等5个文件。
vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hd1:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/hadoop/hadoop-2.7.1/tmp</value>
<description>Abase for other temporary directories.</description>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>hd1:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/hadoop/hadoop-2.7.1/tmp/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/usr/hadoop/hadoop-2.7.1/tmp/dfs/data</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>hd1:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>hd1:19888</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>hd1</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
salves
vi salves
hd3
hd4
把hadoop_cluster文件拷贝到hd2,hd3,hd4
[hadoop@hd1 etc]$ scp -p -r hadoop_cluster/ hadoop@hd2:/usr/hadoop/hadoop-2.7.1/etc/
[hadoop@hd1 etc]$ scp -p -r hadoop_cluster/ hadoop@hd3:/usr/hadoop/hadoop-2.7.1/etc/
[hadoop@hd1 etc]$ scp -p -r hadoop_cluster/ hadoop@hd4:/usr/hadoop/hadoop-2.7.1/etc/
格式化:
hadoop namenode -format --config /usr/hadoop/hadoop-2.7.1/etc/hadoop_cluster/
启动:
[hadoop@hd1 ~]$ start-dfs.sh
[hadoop@hd1 ~]$ start-yarn.sh
[hadoop@hd1 ~]$ mr-jobhistory-daemon.sh start historyserver
hd1部署NameNode实例,hd2不是辅助NameNode实例和DataNode实例
hd3,hd4部署DataNode实例
|NameNode | SecondaryNameNode | DataNode | |
hd1 | Y | ||
hd2 | Y | Y | |
hd3 | Y | ||
hd4 | Y |
hd1:
[hadoop@hd1 sbin]$ jps
7764 Jps
7017 ResourceManager
6734 NameNode
hd2:
[root@hd2 ~]# jps
3222 NodeManager
3142 SecondaryNameNode
3962 Jps
3035 DataNode
hd3:
[root@hd3 ~]# jps
3600 DataNode
3714 NodeManager
4086 Jps
hd4:
[root@hd4 ~]# jps
3024 NodeManager
3373 Jps
2909 DataNode
yarn 资源管理框架,有NodeManager和ResourceManager进程,NM在DataNode节点上,而RM在NameNode节点上。