ha怎么样把maven打包_大数据（MapReduce编程，maven部署，及其ResourceManager的高可用HA）...

最新推荐文章于 2022-04-24 20:22:16 发布

原创最新推荐文章于 2022-04-24 20:22:16 发布 · 180 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#ha怎么样把maven打包

本文介绍了如何在Hadoop集群上进行MapReduce编程，包括配置Hadoop的相关XML文件，如core-site.xml、hdfs-site.xml等。通过MapReduce经典案例WordCount阐述MapReduce的工作原理，并提供了完整的MapReduce程序示例。此外，文章还详细讲解了YARN集群的启动，以及ResourceManager的高可用HA配置，包括启用HA、设置ResourceManager节点等。

####

大数据课程第四天

Hadoop相关的配置信息

core # 基础通用配置内容 1.namenode总入口 2.临时目录

hdfs # hdfs相关内容的配置 1.权限 2.副本 3. HA高可用

mapred # mapreduce相关的配置

yarn # yarn相关的配置

#底层的配置文件,存储都是默认值,根据需要进行修改

core-default.xml

hdfs-default.xml

marpred-default.xml

yarn-default.xml

# HADOOP_HOME/etc/hadoop

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

# 代码级维护性查优先级高

Configuration configuration = new Configuration();

configuration.set("fs.default.name","hdfs://hadoop:8020");

configuration.set("key","value");

.....

FileSystem fileSystem = FileSystem.get(configuration);

# 代码级维护性好优先级低

Configuration configuration = new Configuration();

configuration.addResource("core-site.xml");

configuration.addResource("hdfs-site.xml");

configuration.addResource("marpred-site.xml");

configuration.addResource("yarn-site.xml");

FileSystem fileSystem = FileSystem.get(configuration);

#Hadoop shell命令直接指定配置信息

#测试

bin/hdfs dfs -ls / -Dfs.defaultFS=xxxx

MapReduce编程

MapReduce基于HDFS之上一种计算平台,计算框架

MapReduce运行原理:

搭建yarn集群 NameNode不能和ResourceManager放置在同一台节点 #保证resourcemanager和namenode不放置在同一个节点,修改yarn-site.xml

#启动yarn 一定要在resourcemanager所在的机器上执行启动命令

sbin/start-yarn.sh

布置作业: HAHDFS集群基础上搭建HAYarn集群

MapReduce的核心5步骤

MR经典案例WordCount 思路分析

MapReduce编程代码

org.apache.hadoopgroupId>

hadoop-commonartifactId>

2.5.2version>

dependency>

org.apache.hadoopgroupId>

hadoop-clientartifactId>

2.5.2version>

dependency>

org.apache.hadoopgroupId>

hadoop-hdfsartifactId>

2.5.2version>

dependency>

org.apache.hadoopgroupId>

hadoop-mapreduce-client-coreartifactId>

2.5.2version>

dependency>

org.apache.hadoopgroupId>

hadoop-yarn-commonartifactId>

2.5.2version>

dependency>

public classTestMapReduce {/*** k1 LongWritable

* v1 Text

* k2 Text

* v2 IntWritable*/

public static class MyMap extends Mapper{

Text k2= newText();

IntWritable v2= newIntWritable();

@Override/*** k1 key 0

* v1 value suns xiaohei*/

protected void map(LongWritable key, Text value, Context context) throwsIOException, InterruptedException {

String line=value.toString();

String[] words= line.split("\t");for(String word:words) {

k2.set(word);

v2.set(1);

context.write(k2,v2);

}

}public static class MyReduce extends Reducer{

Text k3= newText();

IntWritable v3= newIntWritable();

@Overrideprotected void reduce(Text key, Iterablevalues, Context context) throwsIOException, InterruptedException {int result = 0;for(IntWritable value:values) {

result+=value.get();

}

k3.set(key);

v3.set(result);

context.write(k3,v3);

}

}public static void main(String[] args)throwsException {

Job job=Job.getInstance();

job.setJarByClass(TestMapReduce.class);

job.setJobName("first");//inputFormat

TextInputFormat.addInputPath(job,new Path("/test"));//map

job.setMapperClass(MyMap.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);//shuffle 自动完成//reduce

job.setReducerClass(MyReduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);//outputFormat

TextOutputFormat.setOutputPath(job,new Path("/dest1"));

job.waitForCompletion(true);

}

MapReduce的部署

注意：(yarn命令需要早hadoop安装的bin目录运行)

①最直接方法

直接maven打包，将jar包scp上传到到服务器即可

bin/yarn jar hadoop-mapreduce.jar 运行

bin/hdfs dfs -text /dest1/part-r-00000查看结果

Bytes Written=38[root@hadoop hadoop-2.5.2]# bin/hdfs dfs -text /dest1/part-r-00000

19/01/24 09:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable

aaa2(次)

bbb2jjj1kkkk1lhc1ssss1

②maven的一键打包上传

IDEA-file-setting-plugins搜索Maven Helper 安装后重启IDEA

pom.xml如下配置：

UTF-8project.build.sourceEncoding>

1.7maven.compiler.source>

1.7maven.compiler.target>

com.baizhi.TestMapReducebaizhi-mainClass>

192.168.194.147target-host>

/opt/install/hadoop-2.5.2target-position>

properties>

...

org.apache.maven.wagongroupId>

wagon-sshartifactId>

2.8version>

extension>

extensions>

org.apache.maven.pluginsgroupId>

maven-jar-pluginartifactId>

2.3.2version>

${basedir}outputDirectory>

${baizhi-mainClass}mainClass>

manifest>

archive>

configuration>

plugin>

org.codehaus.mojogroupId>

wagon-maven-pluginartifactId>

1.0version>

${project.build.finalName}.jarfromFile>

scp://root:123456@${target-host}${target-position}url>configuration>

plugin>

plugings>

build>

以上配置好后就可以点击maven插件，先双击Jar:jar完成打包，在点击wagon：upload完成上传

但是怎么一键完成上诉两个步骤呢？

这时候就需要上面安装的插件maven helper了，pom.xml文件上右键点击：

Run Maven ->new Goal 输入内容：jar:jar wagon:upload 点击OK即可完成打包上传一键完成

③maven的一键打包上传及其运行

在②上面的基础上，给wagon添加commands运行命令，如下：

org.codehaus.mojogroupId>

wagon-maven-pluginartifactId>

1.0version>

${project.build.finalName}.jarfromFile>

scp://root:123456@${target-host}${target-position}url>

pkill -f ${project.build.finalName}.jarcommand>

nohup /opt/install/hadoop-2.5.2/bin/yarn jar /opt/install/hadoop-2.5.2/${project.build.finalName}.jar > /root/nohup.out 2>&1 &command>

commands>

truedisplayCommandOutputs>

configuration>

plugin>

接着在mavenhelper 添加new Goal：

jar:jar wagon:upload-single wagon:sshexec

运行之前记得先complie一下，确保项目的target目录里已将编译好了

在resourcemanager节点上查看nohup.out文件，可见运行成功

ResourceManager的高可用(HA)

①.yarn-site.xml下配置如下内容

yarn.nodemanager.aux-servicesname>

mapreduce_shufflevalue>

property>

yarn.resourcemanager.ha.enabledname>

truevalue>

property>

yarn.resourcemanager.cluster-idname>

lhcvalue>

property>

yarn.resourcemanager.ha.rm-idsname>

rm1,rm2value>

property>

yarn.resourcemanager.hostname.rm1name>

hadoop1value>

property>

yarn.resourcemanager.hostname.rm2name>

hadoop2value>

property>

yarn.resourcemanager.zk-addressname>

hadoop:2181,hadoop1:2181,hadoop2:2181value>

property>

configuration>

②.分别在hadoop1，hadoop2的hadoop安装目录上运行： sbin/start-yarn.sh 启动ResourceManag

③.运行jps查看进程, ResourceManager正常启动

[root@hadoop1 hadoop-2.5.2]# jps

4552 NameNode

4762 DFSZKFailoverController

4610 DataNode

5822 ResourceManager

6251 Jps

4472 JournalNode

4426 QuorumPeerMain

④.分别运行：bin/yarn rmadmin -getServiceState rm2和bin/yarn rmadmin -getServiceState rm1

查看两节点的REsourceMananger的状态，一个为active，另一个为standby

[root@hadoop1 hadoop-2.5.2]# bin/yarn rmadmin -getServiceState rm1

19/01/24 11:56:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

active

[root@hadoop1 hadoop-2.5.2]# bin/yarn rmadmin -getServiceState rm2

19/01/24 11:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

standby

⑤将一台的rm1的ResourceManager关闭，再次执行：bin/yarn rmadmin -getServiceState rm2

发现：rm2状态为active,这就实现了ResManager的自动故障转移

详情见博客：https://blog.youkuaiyun.com/skywalker_only/article/details/41726189