运行环境
ubuntu18.04
hadoop2.7.1
jdk1.8
1.修改每台服务器的主机名
vim /etc/hostname
4台服务器的主机名分别为 Master、Slave1、Slave2、Slave3
保存修改后重启服务器reboot
2.修改每台服务器的hosts文件
vim /etc/hosts
添加以下内容
[Master的ip] Master
[Slave1的ip] Slave1
[Slave2的ip] Slave2
[Slave3的ip] Slave3
3.免密进入下面网址教程
https://blog.youkuaiyun.com/qq_39124136/article/details/101313888
4.安装JDK
安装jdk1.8
下载jdk1.8 https://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html
解压
mkdir /usr/lib/jvm
tar -zxvf jdk-8u211-linux-x64.tar.gz -C /usr/lib/jvm
修改配置文件
vim ~/.bashrc
添加以下内容
export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_211
export JRE_HOME=
J
A
V
A
H
O
M
E
/
j
r
e
e
x
p
o
r
t
C
L
A
S
S
P
A
T
H
=
.
:
{JAVA_HOME}/jre export CLASSPATH=.:
JAVAHOME/jreexportCLASSPATH=.:{JAVA_HOME}/lib:
J
R
E
H
O
M
E
/
l
i
b
e
x
p
o
r
t
P
A
T
H
=
{JRE_HOME}/lib export PATH=
JREHOME/libexportPATH={JAVA_HOME}/bin:$PATH
使修改生效
source ~/.bashrc
5.安装hadoop2.7
安装hadoop3.2
下载hadoop3.2 https://hadoop.apache.org/releases.html
解压
tar zxvf hadoop-3.2.0.tar.gz -C /usr/local
cd /usr/local
mv hadoop-3.2.0 hadoop
修改配置文件
vim ~/.bashrc
添加以下内容
export HADOOP_HOME=/usr/local/hadoop
export PATH=
P
A
T
H
:
PATH:
PATH:HADOOP_HOME/sbin:$HADOOP_HOME/bin
使修改生效
source ~/.bashrc
修改hadoop配置文件
一共需要修改5个文件,位于 /usr/local/hadoop/etc/hadoop/
配置 slaves ,添加以下内容
Slave1
Slave2
Slave3
具体配置情况
mapred-site.xml.template(hadoop本身,将其复制重命名为mapred-site.xml
cp mapred-site.xml.template mapred-site.xml
配置-env.sh文件:
配置 hadoop-env.sh文件:
cd /usr/local/hadoop/etc/hadoop
sudo vim hadoop-env.sh
将JAVA_HOME文件配置为本机JAVA_HOME路径
配置 yarn-env.sh
将其中的JAVA_HOME修改为本机JAVA_HOME路径(先把这一行的#去掉)
将master配置好的Hadoop复制到从节点上:
在从节点slave上建立好放置的文件目录,与master上相同位置:
sudo mkdir /usr/local/hadoop
master上执行指令:
scp -r /usr/local/hadoop/hadoop-2.7.1 hadoop@slave1: /usr/local/hadoop
(如果出现权限问题,请看https://blog.youkuaiyun.com/qq_39124136/article/details/102782284)
6.启动hadoop:
cd /usr/local/hadoop
hadoop namenode -format # 格式化
sbin/start-all.sh # 启动hadoop
(如果从节点输入jps后没有datanode:https://blog.youkuaiyun.com/qq_39124136/article/details/102782284)
7.测试
7.1 最后用自带的样例测试hadoop集群能不能正常跑任务
hadoop jar /home/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi 10 10
7.2 hadoop自带的wordcount例子,这个是统计单词个数的,首先要在hdfs系统中创建文件夹,要查看hdfs系统可以通过hadoop fs -ls来查看hdfs系统的文件以及目录情况,如下图所示:
(1)使用如下命令在hdfs中创建一个word_count_input文件夹:
hadoop fs -mkdir word_count_input
(2)然后使用如下命令在本地创建两个文件file1.txt和file2.txt:
sudo vim file1.txt
然后在file1.txt输入如下内容,由此可以看到:hello 5,hadoop 4,sunxj 2 win 1:
hello hadoop
hello sunxj
hello hadoop
hello hadoop
hello win
sunxj hadoop
(ESC :wq 进行保存)
(3)在file2.txt输入如下内容,由此可以看到:hello 2,linux 2,window 2:
linux window
hello linux
由此可以计算出world的个数分别为:hello有7个,hadoop有4,sunxj有2个,win有1个inux,有2个,window有2个。
(4)通过如下命令将file1.txt和file2.txt文件上传到hdfs系统的word_count_input文件夹中:
hadoop fs -put file*.txt word_count_input
6、然后使用如下命令运行wordcount:
hadoop jar /usr/hadoop-2.7.7/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.7.jar wordcount word_count_input word_count_output
其中wordcount是类名,word_count_input是输入文件夹目录,word_count_output是输出目录
如果出现拒绝连接如图所示:
18/12/26 20:51:23 INFO client.RMProxy: Connecting to ResourceManager at sunxj-hdm.myhd.com/192.168.0.109:18040
18/12/26 20:51:24 INFO input.FileInputFormat: Total input paths to process : 2
18/12/26 20:51:24 INFO mapreduce.JobSubmitter: number of splits:2
18/12/26 20:51:25 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1545828647598_0001
18/12/26 20:51:25 INFO impl.YarnClientImpl: Submitted application application_1545828647598_0001
18/12/26 20:51:25 INFO mapreduce.Job: The url to track the job: http://sunxj-hdm.myhd.com:18088/proxy/application_1545828647598_0001/
18/12/26 20:51:25 INFO mapreduce.Job: Running job: job_1545828647598_0001
18/12/26 20:57:28 INFO mapreduce.Job: Job job_1545828647598_0001 running in uber mode : false
18/12/26 20:57:28 INFO mapreduce.Job: map 0% reduce 0%
18/12/26 20:57:28 INFO mapreduce.Job: Job job_1545828647598_0001 failed with state FAILED due to: Application application_1545828647598_0001 failed 2 times due to Error launching appattempt_1545828647598_0001_000002. Got exception: java.net.ConnectException: Call From sunxj-hdm/127.0.0.1 to localhost:37113 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused
at sun.reflect.GeneratedConstructorAccessor47.newInstance(Unknown Source)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:732)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at org.apache.hadoop.ipc.ProtobufRpcEngine
I
n
v
o
k
e
r
.
i
n
v
o
k
e
(
P
r
o
t
o
b
u
f
R
p
c
E
n
g
i
n
e
.
j
a
v
a
:
229
)
a
t
c
o
m
.
s
u
n
.
p
r
o
x
y
.
Invoker.invoke(ProtobufRpcEngine.java:229) at com.sun.proxy.
Invoker.invoke(ProtobufRpcEngine.java:229)atcom.sun.proxy.Proxy83.startContainers(Unknown Source)
at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.
P
r
o
x
y
84.
s
t
a
r
t
C
o
n
t
a
i
n
e
r
s
(
U
n
k
n
o
w
n
S
o
u
r
c
e
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
y
a
r
n
.
s
e
r
v
e
r
.
r
e
s
o
u
r
c
e
m
a
n
a
g
e
r
.
a
m
l
a
u
n
c
h
e
r
.
A
M
L
a
u
n
c
h
e
r
.
l
a
u
n
c
h
(
A
M
L
a
u
n
c
h
e
r
.
j
a
v
a
:
118
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
y
a
r
n
.
s
e
r
v
e
r
.
r
e
s
o
u
r
c
e
m
a
n
a
g
e
r
.
a
m
l
a
u
n
c
h
e
r
.
A
M
L
a
u
n
c
h
e
r
.
r
u
n
(
A
M
L
a
u
n
c
h
e
r
.
j
a
v
a
:
250
)
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
r
u
n
W
o
r
k
e
r
(
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
.
j
a
v
a
:
1149
)
a
t
j
a
v
a
.
u
t
i
l
.
c
o
n
c
u
r
r
e
n
t
.
T
h
r
e
a
d
P
o
o
l
E
x
e
c
u
t
o
r
Proxy84.startContainers(Unknown Source) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118) at org.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor
Proxy84.startContainers(UnknownSource)atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.launch(AMLauncher.java:118)atorg.apache.hadoop.yarn.server.resourcemanager.amlauncher.AMLauncher.run(AMLauncher.java:250)atjava.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)atjava.util.concurrent.ThreadPoolExecutorWorker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: 拒绝连接
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
at org.apache.hadoop.ipc.Client
C
o
n
n
e
c
t
i
o
n
.
s
e
t
u
p
C
o
n
n
e
c
t
i
o
n
(
C
l
i
e
n
t
.
j
a
v
a
:
615
)
a
t
o
r
g
.
a
p
a
c
h
e
.
h
a
d
o
o
p
.
i
p
c
.
C
l
i
e
n
t
Connection.setupConnection(Client.java:615) at org.apache.hadoop.ipc.Client
Connection.setupConnection(Client.java:615)atorg.apache.hadoop.ipc.ClientConnection.setupIOstreams(Client.java:713)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)
… 15 more
. Failing the application.
18/12/26 20:57:28 INFO mapreduce.Job: Counters: 0
出现此错误则需要将master、slave1、slave2上的/etc/hosts文件中所有127.0.01的注释掉,如下图所示:
然后停止hadoop然后在启动,继续执行第6步即可,如下打印信息说明执行成功:
18/12/26 21:44:36 INFO client.RMProxy: Connecting to ResourceManager at sunxj-hdm.myhd.com/192.168.0.109:18040
18/12/26 21:44:37 INFO input.FileInputFormat: Total input paths to process : 2
18/12/26 21:44:37 INFO mapreduce.JobSubmitter: number of splits:2
18/12/26 21:44:38 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1545831828732_0001
18/12/26 21:44:38 INFO impl.YarnClientImpl: Submitted application application_1545831828732_0001
18/12/26 21:44:38 INFO mapreduce.Job: The url to track the job: http://sunxj-hdm.myhd.com:18088/proxy/application_1545831828732_0001/
18/12/26 21:44:38 INFO mapreduce.Job: Running job: job_1545831828732_0001
18/12/26 21:44:54 INFO mapreduce.Job: Job job_1545831828732_0001 running in uber mode : false
18/12/26 21:44:54 INFO mapreduce.Job: map 0% reduce 0%
18/12/26 21:45:10 INFO mapreduce.Job: map 100% reduce 0%
18/12/26 21:45:18 INFO mapreduce.Job: map 100% reduce 100%
18/12/26 21:45:19 INFO mapreduce.Job: Job job_1545831828732_0001 completed successfully
18/12/26 21:45:20 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=90
FILE: Number of bytes written=369023
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=380
HDFS: Number of bytes written=48
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=26359
Total time spent by all reduces in occupied slots (ms)=6412
Total time spent by all map tasks (ms)=26359
Total time spent by all reduce tasks (ms)=6412
Total vcore-milliseconds taken by all map tasks=26359
Total vcore-milliseconds taken by all reduce tasks=6412
Total megabyte-milliseconds taken by all map tasks=26991616
Total megabyte-milliseconds taken by all reduce tasks=6565888
Map-Reduce Framework
Map input records=9
Map output records=18
Map output bytes=184
Map output materialized bytes=96
Input split bytes=268
Combine input records=18
Combine output records=7
Reduce input groups=6
Reduce shuffle bytes=96
Reduce input records=7
Reduce output records=6
Spilled Records=14
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=407
CPU time spent (ms)=7380
Physical memory (bytes) snapshot=526221312
Virtual memory (bytes) snapshot=5854740480
Total committed heap usage (bytes)=283058176
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=112
File Output Format Counters
Bytes Written=48
此时在看word_count_output文件夹,如下图所示:
11、然后使用如下命令打印:
hadoop fs -cat word_count_output/part-r-00000
未完全解决如果过程中出现下列错误:hadoop集群运行 jar 包 报错 running in uber mode : false
解决办法:修改yarn-site.xml文件
原文链接:https://blog.youkuaiyun.com/sunxiaoju/article/details/85222290