MapReduce Java API编程实验

最新推荐文章于 2024-04-01 22:51:01 发布

whvcse_hlzhang

最新推荐文章于 2024-04-01 22:51:01 发布

阅读量1.7k

点赞数

本文链接：https://blog.youkuaiyun.com/qq_35447918/article/details/102717659

版权

这篇博客介绍了如何使用MapReduce Java API进行编程实验，包括在Windows环境下创建项目、导入Hadoop相关jar包，以及在Linux上运行Hadoop命令行操作，如启动HDFS和YARN，执行MapReduce任务。文章提供了多个示例源码，如词频统计、天气统计、用户搜索次数统计、数据去重、数据排序、计算平均成绩和成绩二次排序等。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

MapReduce Java API编程实验（仅供课堂教学演示）

实验步骤：

1）在win7下用Eclipese创建Java Project，再创建WordCount.java源代码文件,并编写源代码

2）直接导入hadoop-2.6.0-cdh5.7.0.tar.gz安装包中与MapReduce API编程的有关jar包

在win7下把hadoop-2.6.0-cdh5.7.0.tar.gz解压缩，需要导入的包路径（不因为清楚到底需要哪些包，干脆全部导入）：

hadoop-2.6.0-cdh5.7.0\share\hadoop\mapreduce2

hadoop-2.6.0-cdh5.7.0\share\hadoop\mapreduce2\lib

hadoop-2.6.0-cdh5.7.0\share\hadoop\common

hadoop-2.6.0-cdh5.7.0\share\hadoop\common\lib

hadoop-2.6.0-cdh5.7.0\share\hadoop\hdfs

hadoop-2.6.0-cdh5.7.0\share\hadoop\hdfs\lib

3）在Linux下用使用Hadoop命令行运行jar包

a）先在win7下用Eclipese中生成jar包

右击项目名称 --> Export --> JAR file --> Next --> 设置导出路径和jar包名称 --> 选择Main Class--> Finish”即可，生成WordCount.jar包

b）启动伪分布式主机Hadoop，依次执行start-dfs.sh和start-yarn.sh，启动HDFS和YARN

c）将win7的WordCount.jar包文件远程发送到Linux

利用远程终端工具XShell的XFtp功能，利用sftp协议实现文件传输，将WordCount.jar文件发送到Linux系统

d）hadoop fs -put wordcount.txt / 把要进行单词统计的文档上传到HDFS

e）hadoop jar WordCount.jar /word.txt /output 执行WordCount.jar程序，输出word.txt的单词统计结果到/output，输出目录/output不能事先存在

f）hadoop fs -cat /output/part-r-00000 查看单词统计的结果

word.txt内容：

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines each offering local computation and storage. Rather than rely on hardware to deliver high-availability the library itself is designed to detect and handle failures at the application layer so delivering a highly-available service on top of a cluster of computers each of which may be prone to failures.A web-based tool for provisioning managing and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS Hadoop MapReduce Hive HCatalog HBase ZooKeeper Oozie Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications including ETL machine learning stream processing and graph computation.

WordCount程序的输出结果：
Ambari	1
Apache	2
ETL	1
HBase	1
HCatalog	1
HDFS	1
Hadoop	5
Hive	2
It	1
MapReduce	2
Oozie	1
Pig	2
Rather	1
Spark	1
Sqoop.	1
The	1
ZooKeeper	1
a	7
ability	1
across	1
allows	1
alongwith	1
also	1
and	9
application	1
applications	2
as	1
at	1
be	1
characteristics	1
cluster	2
clusters	2
computation	1
computation.	1
compute	1
computers	2
dashboard	1
data	1
data.	1
deliver	1
delivering	1
designed	2
detect	1
diagnose	1
distributed	1
each	2
engine	1
expressive	1
failures	1
failures.A	1
fast	1
features	1
for	5
framework	1
from	1
general	1
graph	1
handle	1
hardware	1
health	1
heatmaps	1
high-availability	1
highly-available	1
in	1
includes	1
including	1
is	3
itself	1
large	1
layer	1
learning	1
library	2
local	1
machine	1
machines	1
managing	1
manner.A	1
may	1
model	1
models.	1
monitoring	1
of	7
offering	1
on	2
performance	1
processing	2
programming	2
prone	1
provides	2
provisioning	1
range	1
rely	1
scale	1
servers	1
service	1
sets	1
simple	2
single	1
so	1
software	1
storage.	1
stream	1
such	1
support	1
supports	1
than	1
that	2
the	3
their	1
thousands	1
to	7
tool	1
top	1
up	1
user-friendly	1
using	1
view	1
viewing	1
visually	1
web-based	1
which	2
wide	1

示例源码一（词频统计）

WordCount.java文件

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;

 
public class Wor

最低0.47元/天解锁文章