ha怎么样把maven打包_大数据(MapReduce编程,maven部署,及其ResourceManager的高可用HA)...

本文介绍了如何在Hadoop集群上进行MapReduce编程,包括配置Hadoop的相关XML文件,如core-site.xml、hdfs-site.xml等。通过MapReduce经典案例WordCount阐述MapReduce的工作原理,并提供了完整的MapReduce程序示例。此外,文章还详细讲解了YARN集群的启动,以及ResourceManager的高可用HA配置,包括启用HA、设置ResourceManager节点等。

####

大数据课程第四天

Hadoop相关的配置信息

core # 基础通用配置内容 1.namenode总入口 2.临时目录

hdfs # hdfs相关内容的配置 1.权限 2.副本 3. HA高可用

mapred # mapreduce相关的配置

yarn   # yarn相关的配置

#底层的配置文件,存储都是默认值,根据需要进行修改

core-default.xml

hdfs-default.xml

marpred-default.xml

yarn-default.xml

# HADOOP_HOME/etc/hadoop

core-site.xml

hdfs-site.xml

mapred-site.xml

yarn-site.xml

# 代码级 维护性查  优先级高

Configuration configuration = new Configuration();

configuration.set("fs.default.name","hdfs://hadoop:8020");

configuration.set("key","value");

.....

FileSystem fileSystem = FileSystem.get(configuration);

# 代码级 维护性好  优先级低

Configuration configuration = new Configuration();

configuration.addResource("core-site.xml");

configuration.addResource("hdfs-site.xml");

configuration.addResource("marpred-site.xml");

configuration.addResource("yarn-site.xml");

FileSystem fileSystem = FileSystem.get(configuration);

#Hadoop shell命令 直接指定 配置信息

#测试

bin/hdfs dfs -ls / -Dfs.defaultFS=xxxx

MapReduce编程

MapReduce基于HDFS之上一种计算平台,计算框架

MapReduce运行原理:

搭建yarn集群 NameNode不能和ResourceManager放置在同一台节点 #保证resourcemanager和namenode不放置在同一个节点,修改yarn-site.xml

#启动yarn 一定要在resourcemanager所在的机器上执行启动命令

sbin/start-yarn.sh

布置作业: HAHDFS集群基础上 搭建HAYarn集群

MapReduce的核心5步骤

MR经典案例WordCount 思路分析

MapReduce编程代码

org.apache.hadoopgroupId>

hadoop-commonartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-clientartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-hdfsartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-mapreduce-client-coreartifactId>

2.5.2version>

dependency>​

org.apache.hadoopgroupId>

hadoop-yarn-commonartifactId>

2.5.2version>

dependency>

public classTestMapReduce {/*** k1 LongWritable

* v1 Text

*

*

* k2 Text

* v2 IntWritable*/

public static class MyMap extends Mapper{

Text k2= newText();

IntWritable v2= newIntWritable();

@Override/*** k1 key 0

* v1 value suns xiaohei*/

protected void map(LongWritable key, Text value, Context context) throwsIOException, InterruptedException {

String line=value.toString();

String[] words= line.split("\t");for(String word:words) {

k2.set(word);

v2.set(1);

context.write(k2,v2);

}

}

}public static class MyReduce extends Reducer{

Text k3= newText();

IntWritable v3= newIntWritable();

@Overrideprotected void reduce(Text key, Iterablevalues, Context context) throwsIOException, InterruptedException {int result = 0;for(IntWritable value:values) {

result+=value.get();

}

k3.set(key);

v3.set(result);

context.write(k3,v3);

}

}public static void main(String[] args)throwsException {

Job job=Job.getInstance();

job.setJarByClass(TestMapReduce.class);

job.setJobName("first");//inputFormat

TextInputFormat.addInputPath(job,new Path("/test"));//map

job.setMapperClass(MyMap.class);

job.setMapOutputKeyClass(Text.class);

job.setMapOutputValueClass(IntWritable.class);//shuffle 自动完成//reduce

job.setReducerClass(MyReduce.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);//outputFormat

TextOutputFormat.setOutputPath(job,new Path("/dest1"));

job.waitForCompletion(true);

}

}

MapReduce的部署

注意:(yarn命令需要早hadoop安装的bin目录运行)

①最直接方法

直接maven打包,将jar包scp上传到到服务器即可

bin/yarn jar hadoop-mapreduce.jar 运行

bin/hdfs dfs -text /dest1/part-r-00000查看结果

Bytes Written=38[root@hadoop hadoop-2.5.2]# bin/hdfs dfs -text /dest1/part-r-00000

19/01/24 09:40:19 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes whereapplicable

aaa2(次)

bbb2jjj1kkkk1lhc1ssss1

②maven的一键打包上传

IDEA-file-setting-plugins搜索Maven Helper 安装后重启IDEA

pom.xml如下配置:

UTF-8project.build.sourceEncoding>

1.7maven.compiler.source>

1.7maven.compiler.target>

com.baizhi.TestMapReducebaizhi-mainClass>

192.168.194.147target-host>

/opt/install/hadoop-2.5.2target-position>

properties>

...

org.apache.maven.wagongroupId>

wagon-sshartifactId>

2.8version>

extension>

extensions>

org.apache.maven.pluginsgroupId>

maven-jar-pluginartifactId>

2.3.2version>

${basedir}outputDirectory>

${baizhi-mainClass}mainClass>

manifest>

archive>

configuration>

plugin>

org.codehaus.mojogroupId>

wagon-maven-pluginartifactId>

1.0version>

${project.build.finalName}.jarfromFile>

scp://root:123456@${target-host}${target-position}url>​configuration>

plugin>

plugings>

build>

以上配置好后就可以点击maven插件,先双击Jar:jar完成打包,在点击wagon:upload完成上传

但是怎么一键完成上诉两个步骤呢?

这时候就需要上面安装的插件maven helper了,pom.xml文件上右键点击:

Run Maven ->new Goal 输入内容:jar:jar wagon:upload 点击OK即可完成打包上传一键完成

③maven的一键打包上传及其运行

在②上面的基础上,给wagon添加commands运行命令,如下:

org.codehaus.mojogroupId>

wagon-maven-pluginartifactId>

1.0version>

${project.build.finalName}.jarfromFile>

scp://root:123456@${target-host}${target-position}url>

pkill -f ${project.build.finalName}.jarcommand>

nohup /opt/install/hadoop-2.5.2/bin/yarn jar /opt/install/hadoop-2.5.2/${project.build.finalName}.jar > /root/nohup.out 2>&1 &command>

commands>

truedisplayCommandOutputs>

configuration>

plugin>

接着在mavenhelper 添加new Goal:

jar:jar wagon:upload-single wagon:sshexec

运行之前记得先complie一下,确保项目的target目录里已将编译好了

在resourcemanager节点上查看nohup.out文件,可见运行成功

ResourceManager的高可用(HA)

①.yarn-site.xml下配置如下内容

yarn.nodemanager.aux-servicesname>

mapreduce_shufflevalue>

property>

yarn.resourcemanager.ha.enabledname>

truevalue>

property>

yarn.resourcemanager.cluster-idname>

lhcvalue>

property>

yarn.resourcemanager.ha.rm-idsname>

rm1,rm2value>

property>

yarn.resourcemanager.hostname.rm1name>

hadoop1value>

property>

yarn.resourcemanager.hostname.rm2name>

hadoop2value>

property>

yarn.resourcemanager.zk-addressname>

hadoop:2181,hadoop1:2181,hadoop2:2181value>

property>

configuration>

②.分别在hadoop1,hadoop2的hadoop安装目录上运行: sbin/start-yarn.sh 启动ResourceManag

③.运行jps查看进程, ResourceManager正常启动

[root@hadoop1 hadoop-2.5.2]# jps

4552 NameNode

4762 DFSZKFailoverController

4610 DataNode

5822 ResourceManager

6251 Jps

4472 JournalNode

4426 QuorumPeerMain

④.分别运行:bin/yarn rmadmin -getServiceState rm2和bin/yarn rmadmin -getServiceState rm1

查看两节点的REsourceMananger的状态,一个为active,另一个为standby

[root@hadoop1 hadoop-2.5.2]# bin/yarn rmadmin -getServiceState rm1

19/01/24 11:56:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

active

[root@hadoop1 hadoop-2.5.2]# bin/yarn rmadmin -getServiceState rm2

19/01/24 11:58:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

standby

⑤将一台的rm1的ResourceManager关闭,再次执行:bin/yarn rmadmin -getServiceState rm2

发现:rm2状态为active,这就实现了ResManager的自动故障转移

详情见博客:https://blog.youkuaiyun.com/skywalker_only/article/details/41726189

### MapReduce 编程初级实践指南 #### 实验目标 - 理解MapReduce的基本概念和工作原理。 - 掌握使用新旧Java API编写MapReduce程序的方法。 - 能够在Hadoop环境中运行简单的MapReduce任务,如WordCount。 #### 实验环境准备 1. **安装Hadoop**:确保已安装Hadoop,并配置好环境变量。可以使用伪分布式模式进行测试[^3]。 2. **启动Hadoop集群**: - 格式化文件系统:`$ bin/hdfs namenode -format` - 启动NameNode和DataNode守护进程:`$ sbin/start-dfs.sh` - 启动ResourceManager和NodeManager:`$ sbin/start-yarn.sh` #### 新旧Java API对比 1. **旧API (org.apache.hadoop.mapred)**: - 使用实现接口的方式编写Mapper和Reducer。 - 示例代码片段: ```java public class OldMapReduceExample { public static class MyMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> { // Mapper逻辑 } public static class MyReducer extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> { // Reducer逻辑 } } ``` 2. **新API (org.apache.hadoop.mapreduce)**: - 使用继承抽象基类的方式编写Mapper和Reducer。 - 示例代码片段: ```java public class NewMapReduceExample { public static class MyMapper extends Mapper<LongWritable, Text, Text, IntWritable> { // Mapper逻辑 } public static class MyReducer extends Reducer<Text, IntWritable, Text, IntWritable> { // Reducer逻辑 } } ``` #### WordCount 实验教程 1. **编写Mapper类**: - 输入键值对为`(LongWritable, Text)`,输出键值对为`(Text, IntWritable)`。 - 代码示例: ```java public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] words = value.toString().split("\\s+"); for (String w : words) { word.set(w); context.write(word, one); } } } ``` 2. **编写Reducer类**: - 输入键值对为`(Text, Iterable<IntWritable>)`,输出键值对为`(Text, IntWritable)`。 - 代码示例: ```java public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } ``` 3. **编写Driver类**: - 配置Job并设置Mapper、Reducer类。 - 代码示例: ```java public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } ``` 4. **编译并打包**: - 使用Maven或Ant工具将代码编译并打包成JAR文件。 5. **运行MapReduce任务**: - 将输入文件上传到HDFS。 - 运行JAR文件:`hadoop jar your-jar-file.jar WordCount input-path output-path` 6. **查看结果**: - 使用HDFS命令查看输出目录中的结果文件:`hadoop fs -cat output-path/part-r-00000` #### 注意事项 - 在编写MapReduce程序时,注意处理异常和关闭资源。 - 确保输入和输出路径正确无误。 - 可以通过调整Hadoop配置文件来优化性能。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符  | 博主筛选后可见
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值