Hadoop-MultipleInputs/MultipleOutputs1<转>

本文介绍如何在MapReduce作业中实现从多个不同类型的输入源读取数据,并使用特定的Mapper进行处理。同时,展示了如何通过MultiOutputFormat自定义类来根据业务需求分隔输出文件。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

一个Job里可以从多个同质或异质的输入源读取数据,并使用各自的Mapper 
Java代码  收藏代码
  1. MultipleInputs.addInputPath(conf, ncdcInputPath,  
  2.     TextInputFormat.class, MaxTemperatureMapper.class)  
  3. MultipleInputs.addInputPath(conf, metOfficeInputPath,  
  4.     TextInputFormat.class, MetOfficeMaxTemperatureMapper.class);  



MultiOutputFormat可以让你按一定规则指定、分隔reduce output的文件名,如 
Java代码  收藏代码
  1. ...  
  2. static class StationNameMultipleTextOutputFormat  
  3.     extends MultipleTextOutputFormat<NullWritable, Text> {  
  4.       
  5.     private NcdcRecordParser parser = new NcdcRecordParser();  
  6.       
  7.     protected String generateFileNameForKeyValue(NullWritable key, Text value,  
  8.         String name) {  
  9.       parser.parse(value);  
  10.       return parser.getStationId();  
  11.     }  
  12.   }  
  13. ...  
当前pom文件: <?xml version="1.0" encoding="UTF-8"?> <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion> <groupId>org.example</groupId> <artifactId>udBatchProject</artifactId> <version>1.1-SNAPSHOT</version> <properties> <hadoop.scope>provided</hadoop.scope> <hadoop.version>2.7.3</hadoop.version> <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding> <java.version>1.8</java.version> <maven.compiler.source>1.8</maven.compiler.source> <maven.compiler.target>1.8</maven.compiler.target> <maven.compiler.compilerVersion>1.8</maven.compiler.compilerVersion> <buildtype>release</buildtype> <encoding>UTF-8</encoding> <project.build.sourceEncoding>${encoding}</project.build.sourceEncoding> </properties> <dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-common</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-jobclient</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> <exclusions> <exclusion> <groupId>com.sina</groupId> <artifactId>DRFA</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-auth</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-client</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-api</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> <exclusions> <exclusion> <artifactId>jdk.tools</artifactId> <groupId>jdk.tools</groupId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-yarn-common</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-mapreduce-client-core</artifactId> <version>${hadoop.version}</version> <scope>${hadoop.scope}</scope> </dependency> <!-- <dependency>--> <!-- <groupId>com.sina.Armyknife</groupId>--> <!-- <artifactId>sina</artifactId>--> <!-- <version>0.5</version>--> <!-- <exclusions>--> <!-- <exclusion>--> <!-- <groupId>com.sina</groupId>--> <!-- <artifactId>DRFA</artifactId>--> <!-- </exclusion>--> <!-- </exclusions>--> <!-- </dependency>--> <dependency> <groupId>org.apache.hive</groupId> <artifactId>hive-exec</artifactId> <version>0.13.0</version> </dependency> <dependency> <groupId>com.googlecode.aviator</groupId> <artifactId>aviator</artifactId> <version>3.0.0</version> <scope>provided</scope> </dependency> <dependency> <groupId>com.alibaba</groupId> <artifactId>fastjson</artifactId> <version>1.2.76</version> </dependency> <dependency> <groupId>com.google.code.gson</groupId> <artifactId>gson</artifactId> <version>2.8.9</version> </dependency> <dependency> <groupId>junit</groupId> <artifactId>junit</artifactId> <version>4.13.1</version> </dependency> <dependency> <groupId>mysql</groupId> <artifactId>mysql-connector-java</artifactId> <version>8.0.29</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-context</artifactId> <version>5.3.15</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-beans</artifactId> <version>5.3.15</version> </dependency> <dependency> <groupId>org.springframework</groupId> <artifactId>spring-jdbc</artifactId> <version>5.3.15</version> </dependency> </dependencies> <repositories> <repository> <id>thirdparty</id> <name>3rd party</name> <url>http://10.39.0.110:8081/nexus/content/repositories/thirdparty</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> <repository> <id>central</id> <name>Central</name> <url>http://10.39.0.110:8081/nexus/content/repositories/central</url> <releases> <enabled>true</enabled> </releases> <snapshots> <enabled>false</enabled> </snapshots> </repository> </repositories> <pluginRepositories> <pluginRepository> <releases> <updatePolicy>never</updatePolicy> </releases> <snapshots> <enabled>false</enabled> </snapshots> <id>public</id> <name>Public Repositories</name> <url>http://10.39.0.110:8081/nexus/content/groups/public/</url> </pluginRepository> </pluginRepositories> <build> <plugins> <plugin> <groupId>org.apache.maven.plugins</groupId> <artifactId>maven-assembly-plugin</artifactId> <version>3.1.0</version> <configuration> <descriptorRefs> <descriptorRef>jar-with-dependencies</descriptorRef> </descriptorRefs> </configuration> <executions> <execution> <phase>package</phase> <goals> <goal>single</goal> </goals> </execution> </executions> </plugin> </plugins> </build> </project> 在引入依赖 import com.sina.hadoop.MultipleInputs;的时候报错
最新发布
07-11
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值