创建伪分布式的EC2 节点
1. 设置 root权限 sudo passwd root (子节点默认设置为 passwd 为 hadoop)
2. 进入权限 su root
------------------------------------------------------------------
3. 安装 java 环境
apt-get install openjdk-7-jdk
4. 设置 环境变量 ( linux 查看java的安装路径 # https://www.cnblogs.com/hanshuai/p/9604730.html )
vim /etc/profile
-----------------
export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64/jre
export PATH=$JAVA_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
--------------------
更新系统变量
5.source /etc/profile
--------------------
验证JAVA_HOME
6.echo $JAVA_HOME
------------------------------------------------------------------
其他环境准备
sudo apt-get install ssh
sudo apt-get install rsync
-------------------------------------------------------------------
下载安装 hadoop:
7. 下载 (hadoop -2.9.2 下载路径 # http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz )
wget http://ftp.cuhk.edu.hk/pub/packages/apache.org/hadoop/common/hadoop-2.9.2/hadoop-2.9.2.tar.gz
---------------------
ls -a 查看是否下载下来了
----------------------
8. 解压
tar -zxvf hadoop-2.9.2.tar.gz
------------------------
9.更改配置文件
etc/hadoop/core-site.xml:
--------------------
/etc/profile
export HADOOP_HOME=/home/ubuntu/hadoop-2.9.2
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
-------------
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
---------------------------------------------
etc/hadoop/hdfs-site.xml:
---------------
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
--------------------------------------------
etc/hadoop/hadoop-env.sh
--------------
修改java路径 export JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64/jre"
------------------------------------------------------------------------
运行前的其他准备
ssh localhost
---------------------如果有问题先执行
ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
----------------------------------------------------
ssh-keygen -f "/home/ubuntu/.ssh/known_hosts" -R localhost
-----------------------------------------------------------------------
运行:(第10步是11的前提)
------------------------------
10.格式化文件
bin/hdfs namenode -format
------------------------------
运行
11. sbin/start-dfs.sh
(停止 : stop-all.sh)
----------------------------------------------------------------------------
12. 验证已运行:
curl http://localhost:50070/
refer to https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/SingleCluster.html
===========================================================================================================================
设置多节点Hadoop
1.修改机器名称
vi /etc/hostname
-为了能正确解析主机名,最好也修改/etc/hosts文件里对应的主机名
----------------------------------------------------------
在 /etc/hosts 中添加下列
172.31.43.232 Master
172.31.41.121 Slave1
172.31.34.186 Slave2
172.31.34.139 Slave3
M 1 M 2 M 3 1 2 1 3 2 3
(无法ping通 需要在安全组开启icmp协议)
-----------------------------------------------------------
2.设置SSH
再此之前要新建用户 不要再root上操作(ubuntu)
-------------------------------------
ssh-keygen -t rsa (默认生成在 home/用户名/.ssh 密码直接回车)
-------------------------------
先把自己的key追加到权限中
cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
chmod 700 -R .ssh
----------------------------------------------------------
修改 /etc/ssh/sshd_config 取消这一行的注释
AuthorizedKeysFile %h/.ssh/authorized_keys
----------------------------------------
** 把主机的公钥发给从机
( $ rm -rf id_rsa.pub -- 删除临时公钥 )
注意: 如果你用的是AWS上的EC2 并且设置了pem验证 你需要把pem文件上传
scp -i "CUHK.pem" path\CUHK.pem <username>@<yourEC2ip>:~ (WIN -> Ubuntu)
之后发动公钥需要用到 pem 否则可能会出现 permission denied错误 如下所示
scp -i "CUHK.pem" ~/.ssh/id_rsa.pub ubuntu@Slave1:~
....
scp id_rsa.pub <EC2ip>:~
---------------------------------------
每次交换公钥都要加入验证
scp -i "CUHK.pem" ~/.ssh/id_rsa.pub ubuntu@Slave1:~
scp -i "CUHK.pem" ~/.ssh/id_rsa.pub ubuntu@Slave2:~
scp -i "CUHK.pem" ~/.ssh/id_rsa.pub ubuntu@Slave3:~
cat ~/id_rsa.pub >> ~/.ssh/authorized_keys
其他解决permission denied的方法 :
ssh-keygen -f /home/ubuntu/.ssh/known_hosts -R 172.31.90.181
--------------------------------------------------------------------
3. 编辑配置文件
( refer to https://blog.youkuaiyun.com/weixin_40526756/article/details/80652525 )
core-site.xml (备注:请先在 /usr/hadoop 目录下建立 tmp 文件夹)
----------------
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://Master:9000</value>
</property>
<!-- dfs不能访问dfs 负责会出现卡在job的情况 -- >
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
---------------------------
hdfs-site.xml
--------------------------
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>Master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/ubuntu/hadoop-2.9.2/dfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/ubuntu/hadoop-2.9.2/dfs/data</value>
</property>
</configuration>
--------------------------------
mapred-site.xml
-----------------------------
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>http://Master:9001</value>
</property>
<!-- 如果配置yarn 就需要 -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
------------------------------
yarn-site.xml (可以不用配置)
------------------------------------------
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.auxservices.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
-----------------------------
slaves 文件
-----------------
Master
Slave1
Slave2
Slave3
----------------
masters
--------------------
Master
-------------------------------------------------------------
4. 把Master上的文件复制到Slave上
scp -r ~/hadoop-2.9.2/etc/hadoop/* ubuntu@Slave1:~/hadoop-2.9.2/etc/hadoop
.......
--------------------------------------
5. 首次启动需要先在 Master 节点执行 NameNode 的格式化:(只能执行一次)
bin/hdfs namenode -format
-----------------------------------------------
6.启动
sbin/start-dfs.sh
sbin/start-all.sh
(sbin/start-yarn.sh / sbin/mr-jobhistory-daemon.sh start historyserver)
-------------------------------------------------------------------------------------------------------------
# 主页 Slave的datanodes 看不到 可能也是 AWS 没有开放 配置端口的原因 slave 的log datanode 显示 一直retry
----------------------------------------------------------------------------------------------------------------------------------
$ ./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-example-2.9.2.jar teragen 21474837 terasort/input2G
错误解决方案 :https://blog.youkuaiyun.com/sinat_33769106/article/details/80905363