场景
输入路径中含有重要信息,需要在map的时候进行处理。
场合
只有一个Mapper,每个mapper只处理一个输入文件
protected void setup(Context context) throws IOException, InterruptedException {
Path path = ((FileSplit) context.getInputSplit()).getPath();
}
使用似CombineTextInputFormat作为输入处理
protected void setup(Context context) throws IOException, InterruptedException {
//只要众多合并文件中的第一个路径,通常会在combine的时候将处理的文件按某种逻辑合并作为一个分片
Path firstPath = ((CombineFileSplit) context.getInputSplit()).getPath(0);
}
MultipleInputs
使用上面的方式会报错
Error: java.lang.ClassCastException: org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit cannot be cast to org.apache.hadoop.mapreduce.lib.input.FileSplit
at com.miaozhen.verify.admVerify.calGivt.GivtMapper.setup(GivtMapper.java:26)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
at org.apache.hadoop.mapreduce.lib.input.DelegatingMapper.run(DelegatingMapper.java:55)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1688)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)
因为TaggedInputSplit类不是public的,无法通过强制转换来解决,可以通过反射来获得TaggedInputSplit中的inputSplit,源自hadoop MultipleInputs fails with ClassCastException
protected void setup(Context context) throws IOException, InterruptedException {
InputSplit split = context.getInputSplit();
Class<? extends InputSplit> splitClass = split.getClass();
FileSplit fileSplit = null;
if (splitClass.equals(FileSplit.class)) {
fileSplit = (FileSplit) split;
} else if (splitClass.getName().equals(
"org.apache.hadoop.mapreduce.lib.input.TaggedInputSplit")) {
// begin reflection hackery...
try {
Method getInputSplitMethod = splitClass
.getDeclaredMethod("getInputSplit");
getInputSplitMethod.setAccessible(true);
fileSplit = (FileSplit) getInputSplitMethod.invoke(split);
} catch (Exception e) {
// wrap and re-throw error
throw new IOException(e);
}
}
Path path = fileSplit.getPath();
}