hadoop_mapreduce_MRUtil

最新推荐文章于 2022-12-08 20:41:45 发布

原创最新推荐文章于 2022-12-08 20:41:45 发布 · 336 阅读

0 ·

CC 4.0 BY-SA版权

hadoop 专栏收录该内容

16 篇文章

订阅专栏

本文介绍如何使用MRUnit库进行Hadoop MapReduce任务的单元测试，包括测试Mapper、Reducer和整个Job流程的方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

1. 导入jar包

MRUnit的jar包：

如果是直接导入的话，需要导入mrunit-1.1.0-hadoop2.jar，及上面压缩文件中lib下的所有jar包：

除了mockito-core-1.9.5.jar，因为会有冲突。

如果是maven项目，导入：

2. 测试maper

添加一个新类WordCountUnitTest：

package com.harvetech.wordcount;

import java.io.IOException;

import org.apache.hadoop.io.LongWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mrunit.mapreduce.MapDriver;

import org.junit.Test;

public class WordCountUnitTest {

@Test

public void testMapper() throws IOException{

//设置一个环节变量

System.setProperty("hadoop.home.dir", "D:\\work\\hadoop-2.6.0\\bin");

//创建一个测试的对象

WordCountMapper mapper = new WordCountMapper();

//创建一个MappeDriver来进行测试

MapDriver<LongWritable, Text, Text, LongWritable> driver = new MapDriver<LongWritable, Text, Text, LongWritable>(mapper);

//指定Map输入的数据: K1,V1

driver.withInput(new LongWritable(1), new Text("I love Beijing"));

//指定Map输出的数据,k2,v2 ------> 期望得到数据

driver.withOutput(new Text("I"), new LongWritable(1))

.withOutput(new Text("love"), new LongWritable(1))

.withOutput(new Text("Beijing"), new LongWritable(1));

//执行测试：对比实际运行的结果和期望得到的结果是否一致？？

driver.runTest();

}

其中map的输出是指期望值，如果最后运行结果和此结果相同，则测试成功，不同则测试失败。

运行结果如下：

结果行为绿色，测试成功。

如果把mapper中的输出改为4，即：

//输出: k2 v2

for(String w:words){

context.write(new Text(w), new LongWritable(4));

}

再次测试，运行报错，结果行为红色：

错误内容如下，期望结果是(1,1)，实际结果为(1,4)：

java.lang.AssertionError: 3 Error(s): (Missing expected output (I, 1) at position 0, got (I, 4)., Missing expected output (love, 1) at position 1, got (love, 4)., Missing expected output (Beijing, 1) at position 2, got (Beijing, 4).)

再来解释一下设置环境变量这行：

System.setProperty("hadoop.home.dir", "D:\\work\\hadoop-2.6.0\\bin");

如果没有此行，运行会提示一个错误，但不影响测试结果：

这是因为在window下运行hadoop的mapreduce程序，需要用到hadoop提供的一个工具：winutils.exe，这个工具在hadoo的bin目录下，因此需要配置环境变量，如果在电脑的环境变量中已设置了，在上面的代码中就不需要再设置。

D:\work\hadoop-2.6.0\bin

3. 测试reduce

和测试mapper类似：

@Test

//测试WordCountReducer程序

public void testReducer() throws Exception{

//创建一个测试对象

WordCountReducer reducer = new WordCountReducer();

//创建一个ReduceDriver

//ReduceDriver<k3, V3, K4, V4>

ReduceDriver<Text, LongWritable, Text, LongWritable> driver = new ReduceDriver<Text, LongWritable, Text, LongWritable>(reducer);

//指定Reducer输入的数据

//构造v3，是一个集合

List<LongWritable> v3 = new ArrayList<LongWritable>();

//往v3中加入v2

v3.add(new LongWritable(1));

driver.withInput(new Text("Beijing"), v3);

//指定Reducer输出的数据------> 期望得到的数据

driver.withOutput(new Text("Beijing"), new LongWritable(3)); //-----> 指定key4和value4

//执行单元测试

driver.runTest();

}

运行，测试成功：

4. 测试job

@Test

//测试Job程序

public void testJob() throws Exception{

//创建测试对象

WordCountMapper mapper = new WordCountMapper();

WordCountReducer reducer = new WordCountReducer();

//创建一个Driver

//MapReduceDriver<K1, V1, K2, V2, K4, V4>

MapReduceDriver<LongWritable, Text, Text, LongWritable,Text, LongWritable>

driver = new MapReduceDriver<LongWritable, Text, Text, LongWritable, Text, LongWritable>(mapper,reducer);

//指定Map输入的数据

driver.withInput(new LongWritable(1), new Text("I love Beijing"))

.withInput(new LongWritable(4), new Text("I love China"))

.withInput(new LongWritable(7), new Text("Beijing is the capital of China"));

//注意：排序

driver.withOutput(new Text("Beijing"), new LongWritable(2))

.withOutput(new Text("China"), new LongWritable(2))

.withOutput(new Text("I"), new LongWritable(2))

.withOutput(new Text("capital"), new LongWritable(1))

.withOutput(new Text("is"), new LongWritable(1))

.withOutput(new Text("love"), new LongWritable(2))

.withOutput(new Text("of"), new LongWritable(1))

.withOutput(new Text("the"), new LongWritable(1));

//执行单元测试

driver.runTest();

}

运行，测试成功：