缘起
最近在使用OpenStack,以前做过Hadoop的毕业设计,基于虚拟机搭建过,最近腾讯云服务器有大优惠,而且腾讯的COS又支持HDFS,所以准备购买4台服务器用于部署Hadoop完全分布式,用于真实的应用MapReduce,并尝试在Spring Cloud中调用Map Reduce,没有部署OpenStack是因为OpenStack要求的配置太高了服务器经费超预算了。COS和服务器均采用香港地域的。
注意
1.文章截图中的主机名字HadoopMaster实际上是master(因为名字不一致导致任务执行超时重新配置,懒得截图了)
部分资料
https://cloud.tencent.com/document/product/436/10867
https://blog.youkuaiyun.com/linhaiyun_ytdx/article/details/90486946
预先测试
腾讯内网网速延时
4台内网小于0.5ms
COS复制速度
挂载文件系统后小于100MB/s,感觉也就是20MB/s的样子,目前仅仅作为资料备份还可以,扩展使用COS的HDFS功能先放下
COS下载上传速度均可用(稳定在6MB/s左右)
局域网网速
用scp命令复制发现72.7MB/s
资源参数
1核2G 4台
OS debian 10.2 用户名统一hadoop并配置4台机子为ssh免密钥互相信任的关系 密码看邮件,root用户名密码统一修改为复杂的看邮件,邮件主题:20200802Hadoop主机密码
搭建采取内网搭建,外网IP维护使用,不做公开
| master | 10.0.4.2 |
| slave1 | 10.0.4.12 |
| slave2 | 10.0.4.13 |
| slave3 | 10.0.4.7 |
版本
操作系统版本
Debian 10.2
java版本
java用adoptopenjdk,版本选1.8

hadoop版本
http://hadoop.apache.org/releases.html#16+April%2C+2018%3A+Release+2.7.6+available

hadoop 2.10.0资料搜索学习
https://www.cnblogs.com/hxuhongming/p/12846770.html
详细操作及配置
基础网络配置(以root用户登录)
4个主机上修改hostname
vi /etc/hostname然后输入名字即可重新命名主机在shell中显示的名字,这里统一输入规划好的,master主机输入master,slave1设置为slave1,依次类推,方便putty操作,同时减少报错
4个主机上修改hosts
vi /etc/hosts
然后
# loaohost可以替换为自己的域名,格式请与真实域名保持一致
10.0.4.2 localhost.master master
10.0.4.12 localhost.slave1 slave1
10.0.4.13 localhost.slave2 slave2
10.0.4.7 localhost.slave3 slave3
4个主机上添加用户用
adduser hadoop
4个添加互相信任(此过程均以hadoop用户操作)
分别在4个主机上生成密钥
ssh-keygen -t rsa //生成密钥
在slave1上
cp ~/.ssh/id_rsa.pub ~/.ssh/slave1.id_rsa.pub
scp ~/.ssh/slave1.id_rsa.pub master:~/.ssh
在slave2上
cp ~/.ssh/id_rsa.pub ~/.ssh/slave2.id_rsa.pub
scp ~/.ssh/slave2.id_rsa.pub master:~/.ssh
在slave3上
cp ~/.ssh/id_rsa.pub ~/.ssh/slave3.id_rsa.pub
scp ~/.ssh/slave3.id_rsa.pub master:~/.ssh
在master上
cd ~/.ssh
cat id_rsa.pub >> authorized_keys
cat slave1.id_rsa.pub >>authorized_keys
cat slave2.id_rsa.pub >>authorized_keys
cat slave3.id_rsa.pub >>authorized_keys
scp authorized_keys slave1:~/.ssh
scp authorized_keys slave2:~/.ssh
scp authorized_keys slave3:~/.ssh
安装java(此过程以hadoop用户操作)
参考 https://cloud.tencent.com/document/product/436/10865
首先参考上面的操作,在master上解压下载的java文件,并修改~/.profile(配置java环境变量,注意路径与腾讯的文档也一样),然后source ./profile
cd
mkdir java
mv /tmp/Downloads/OpenJDK8U-jdk_x64_linux_hotspot_8u265b01.tar.gz ./
tar -zxvf OpenJDK8U-jdk_x64_linux_hotspot_8u265b01.tar.gz
然后查看java版本

确定OK后,在slave1、slave2、slave3上分别操作,操作完毕确认java版本一致即可
cd
scp -r master:~/java ./java
scp -r master:~/.profile .profile
source .profile
java -version
安装hadoop
cd
mkdir hadoop
mv /tmp/Downloads/hadoop-2.10.0.tar.gz ./hadoop
cd hadoop
tar -zxvf hadoop-2.10.0.tar.gz

修改hadoop配置
进入/home/hadoop/hadoop/hadoop-2.10.0/etc/hadoop目录,然后参考腾讯的文档进行修改

hadoop-env.sh
在JAVA_HOME=${JAVA_HOME}前面增加source /home/hadoop/.profile
修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://master:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/home/hadoop/hadoop/hadoop-2.10.0/tmp</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/hadoop/hadoop-2.10.0/hdf/data</value>
<final>true</final>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/hadoop/hadoop-2.10.0/hdf/name</value>
<final>true</final>
</property>
</configuration>
cp mapred-site.xml.template mapred-site.xml,然后修改mapred-site.xml
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
yarn.site
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
复制到slave
scp -r ~/hadoop/ slave1:~/hadoop/
scp -r ~/hadoop/ slave2:~/hadoop/
scp -r ~/hadoop/ slave3:~/hadoop/
在master配置环境变量修改~/.profile,然后scp同步到3个slave
export HADOOP_HOME=/home/hadoop/hadoop/hadoop-2.10.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
export HADOOP_LOG_DIR=/home/hadoop/hadoop/hadoop-2.10.0/logs
export YARN_LOG_DIR=$HADOOP_LOG_DIR
scp ~/profile slave1:~/.profile
scp ~/profile slave2:~/.profile
scp ~/profile slave3:~/.profile
然后分别去三个slave执行(或者重启三个slave)
source ~/.profile
启动Hadoop
去master主机执行格式化hdfs
hdfs namenode -format
在master启动

查看进程用直接输入jps即可
查看端口占用
netstat -ntpl
截图
防火墙放开后,可以通信后可以看到node都是正常的


测试
hadoop fs -mkdir /input
# 查看fs
hadoop fs -ls /
# 先自己随便给input.txt写入一些英文单词
hadoop fs -put input.txt /input
hadoop jar /home/hadoop/hadoop/hadoop-2.10.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.10.0.jar wordcount /input /output/
如果1分钟没跑完基本上就是配置有问题了,具体要看logs目录(/home/hadoop/hadoop/hadoop-2.10.0/logs)下的相关文件。
再次执行需要把输出的文件夹重新弄个名字比如output1
优化
内存优化
我们的服务器配置性能比较差就是1核心2G,需要修改下yarn-site.xml
<?xml version="1.0"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<configuration>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>1536</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
4台主机防火墙配置
一台配置好,直接把配置文件复制过去即可(在没配置完并测试完毕前不要保存,防止启动后连不上)
iptables -P INPUT ACCEPT
iptables -F
# 必须先放通22
iptables -I INPUT -p tcp --dport 22 -j ACCEPT
# 配置默认策略
iptables -P INPUT DROP
# 放通内网10.0.0.4/24
iptables -I INPUT -s 10.0.4.2/24 -p all -j ACCEPT
# 放通内网测试机
iptables -I INPUT -s 10.0.0.6 -p all -j ACCEPT
# 放通50070和8088端口
iptables -I INPUT -p tcp --dport 50070 -j ACCEPT
iptables -I INPUT -p tcp --dport 8088 -j ACCEPT
# 放通50070和8088端口
iptables -I INPUT -p tcp --dport 50075 -j ACCEPT
iptables -I INPUT -p tcp --dport 8042 -j ACCEPT
# 放通本地所有
iptables -I INPUT -s 127.0.0.1 -p all -j ACCEPT
查看防火墙
# 不带-n会比较慢
iptables -L -n
配置为开机启动
# 保存iptables现有规则到/etc/iptables.up.rules
iptables-save > /etc/iptables.up.rules
# 建立系统启动加载文件/etc/network/if-pre-up.d/iptables
vi /etc/network/if-pre-up.d/iptables
输入如下内容
#!/bin/bash
/sbin/iptables-restore < /etc/iptables.up.rules
# 赋予权限
chmod +x /etc/network/if-pre-up.d/iptables
# 保护,禁止非root用户读防火墙规则文件
chmod 700 /etc/iptables.up.rules
异常场景
上传时出现的异常
hadoop@HadoopMaster:~$ hadoop fs -put input.txt /input
20/08/02 18:17:21 WARN hdfs.DataStreamer: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /input/input.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1832)
at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:265)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2591)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:880)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:517)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:507)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:994)
at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:922)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1893)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2833)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1540)
at org.apache.hadoop.ipc.Client.call(Client.java:1486)
at org.apache.hadoop.ipc.Client.call(Client.java:1385)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:448)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1846)
at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1645)
at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:710)
put: File /input/input.txt._COPYING_ could only be replicated to 0 nodes instead of minReplication (=1). There are 0 datanode(s) running and no node(s) are excluded in this operation.
各种排查无法解决,最后想起来防火墙,于是在slave上telnet master 端口,其中端口是在master注入jps查看获取的端口,发现都无法通
telnet命令格式:telnet master 端口号
解决:放通所有内网端口即可。如果放通了外网所有,请务必配置防火墙,当前cvm不提供,需要自己在系统里面配置。不配置的情况下建议保持所有节点出于poweroff状态。
datanode启动失败
启动成功后再次hdfs namenode -format后datanode启动会失败可以参考https://blog.youkuaiyun.com/yu0_zhang0/article/details/78841623
任务失败
20/08/02 21:02:41 INFO mapreduce.Job: Job job_1596372829050_0001 failed with state FAILED due to: Application application_1596372829050_0001 failed 2 times due to Error launching appattempt_1596372829050_0001_000002. Got exception: java.net.ConnectException: Call From HadoopMaster/10.0.4.2 to localhost.localdomain:45563 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor44.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:824)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:754)
at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1544)
at org.apache.hadoop.ipc.Client.call(Client.java:1486)
at org.apache.hadoop.ipc.Client.call(Client.java:1385)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
at com.sun.proxy.$Proxy83.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:128)
at sun.reflect.GeneratedMethodAccessor17.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
at com.sun.proxy.$Proxy84.startContainers(Unknown Source)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:122)
at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:311)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:714)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:701)
at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:805)
at org.apache.hadoop.ipc.Client$Connection.access$3700(Client.java:423)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1601)
at org.apache.hadoop.ipc.Client.call(Client.java:1432)
... 19 more
. Failing the application.
20/08/02 21:02:41 INFO mapreduce.Job: Counters: 0
3521

被折叠的 条评论
为什么被折叠?



