错误日志:
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, org.apache.hadoop.security.AccessControlException: Permission denied: user=wang.nengjie, access=EXECUTE, inode="/user/yarn":yarn:supergroup:drwx------
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:279)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:260)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:201)
at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:154)
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:152)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3877)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:3860)
at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:3842)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkAncestorAccess(FSNamesystem.java:6730)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:2908)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInt(FSNamesystem.java:2826)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:2711)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:602)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.create(AuthorizationProviderProxyClientProtocol.java:115)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:412)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2226)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2222)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2220)
错误原因:
HFileOutputFormat2.configureIncrementalLoad(job, table, connection.getRegionLocator(tableNames));//执行代码的用户是hdfs用户生成Hfile的临时目录的拥有者也是hdfs 但是在执行BulkLoad 操作用户却是Hbase用户,真是叫人头大 //这个会导致,没有权限:Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, org.apache.hadoop.security.AccessControlException: Permission denied: user=wang.nengjie, access=EXECUTE, inode="/user/yarn":yarn:supergroup:drwx------
解决方式:
将MR的Configuration和导入HBase建立连接时的HBase的Configuration定义为同一个变量。如:
public static void main(String args[]) throws Exception {
//报权限问题网上解决方法一:修改Hadoop用户,并没有用
// System.out.println("设置HADOOP_USER_NAME用户");
// System.setProperty("HADOOP_USER_NAME","hdfs");//仅对写HFile load到HBase中
// System.setProperty("HADOOP_USER_NAME", "hadoop");
String tableName = args[0];
String startYMD = args[1];
String stopYMD = args[2];
String functionHDFSPath = args[3];//需要处理的目标HDFS文件目录
String subCanIDHDFSPath = args[4];
String jobOutputHDFSPath = args[5];//输出目录
//HBase配置
//1.获得配置文件对象
Configuration conf = HBaseConfiguration.create();
conf.set(TableOutputFormat.OUTPUT_TABLE,tableName);
conf.set("hbase.mapreduce.bulkload.max.hfiles.perRegion.perFamily", "400");
conf.set("orc.mapred.output.schema", "struct<xxx:String,xxx:int,xxx:int,xxx:int,xxx:int,xxx:String>");
//报权限问题网上解决方法二:hdfs用户组的使用权限,为true or false 依旧无效
// mrConf.set("dfs.permissions","true");
conf.set("mapreduce.map.memory.mb", "3072");
conf.set("mapreduce.reduce.memory.mb", "4096");
conf.set("yarn.nodemanager.resource.memory-mb", "8192");
conf.set("yarn.nodemanager.vmem-pmem-ratio", "5");//2.1
conf.set("mapreduce.input.fileinputformat.input.dir.recursive", "true");
String sub_CANIDs = getSubCanIDAll(subCanIDHDFSPath, conf);
System.out.println(sub_CANIDs);
conf.set("checkSubCanID", sub_CANIDs);
Job job = Job.getInstance(conf, "FivMinxx_" + startYMD + "_" + stopYMD);
job.setJarByClass(FiveMinxxxMR.class);
job.setMapperClass(ORCMapper.class);
job.setReducerClass(ToHFileReducer.class);
job.setNumReduceTasks(200);
job.setInputFormatClass(OrcInputFormat.class);
// job.setOutputFormatClass(HFileOutputFormat2.class);
job.setPartitionerClass(xxxpartition.class);
job.setMapOutputKeyClass(xxxTimeWritable.class);
job.setMapOutputValueClass(Text.class);
// job.setOutputKeyClass(ImmutableBytesWritable.class);
// job.setOutputValueClass(Put.class);
//设置输入的orc文件
List<String> DateList = new LinkedList<>();
if (startYMD.equals(stopYMD)) {
DateList.add(startYMD);
} else {
int daysCount = Util.getDaysFromCalendar(startYMD, stopYMD) + 2;
DateList = Util.getDates(startYMD, daysCount);
}
List<Integer> FunctAll = getFunctAll(functionHDFSPath, conf);
for (Integer temp : FunctAll) {
System.out.println(temp);
}
for (String dataDates : DateList) {
String year = dataDates.substring(0, 4);//年
String month = dataDates.substring(4, 6);//月
String day = dataDates.substring(6);//日
for (Integer xxx: FunctAll) {
FileSystem fileSystem = FileSystem.get(conf);
Path inputPath1 = new Path("/user/xxx/hivetable/xxx/xxx=" + xxx+ "/years=" + year + "/months=" + month + "/days=" + day + "/");
boolean result = fileSystem.isDirectory(inputPath1);
if (result == true)
FileInputFormat.addInputPath(job, inputPath1);
}
}
//第一个普通的 MR job输出到HDFS
Path outputPath = new Path(jobOutputHDFSPath);
// FileOutputFormat.setOutputPath(job, outputPath);
HFileOutputFormat2.setCompressOutput(job, true);
HFileOutputFormat2.setOutputPath(job, outputPath); //MR输出存储HDFS格式为HFile
outputPath.getFileSystem(conf).delete(outputPath, true);
// 2.建立连接
Connection connection = ConnectionFactory.createConnection(conf);
//3.获得会话
Admin admin = connection.getAdmin();
TableName tableNames = TableName.valueOf(tableName);
Table table = connection.getTable(tableNames);
//设置输出的文件格式
HFileOutputFormat2.configureIncrementalLoad(job, table, connection.getRegionLocator(tableNames));//执行代码的用户是hdfs用户生成Hfile的临时目录的拥有者也是hdfs 但是在执行BulkLoad 操作用户却是Hbase用户,真是叫人头大 //这个会导致,没有权限:Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.JavaMain], main() threw exception, org.apache.hadoop.security.AccessControlException: Permission denied: user=wang.nengjie, access=EXECUTE, inode="/user/yarn":yarn:supergroup:drwx------
// HFileOutputFormat.configureIncrementalLoad(job,new HTable(hbConf,tableName));
job.waitForCompletion(true);
if (job.isSuccessful()){//导入数据
try {
LoadIncrementalHFiles loadFiles = new LoadIncrementalHFiles(conf);
loadFiles.doBulkLoad(outputPath,admin, table, connection.getRegionLocator(tableNames));
System.out.println("Bulk Load Completed..");
} catch(Exception exception) {
exception.printStackTrace();
}
}
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
主要修改代码在:
MR的配置和HBase配置使用同一个:
Configuration conf = HBaseConfiguration.create();
Job job = Job.getInstance(conf, "xxx_" + startYMD + "_" + stopYMD);
Connection connection = ConnectionFactory.createConnection(conf);
这个主函数读Orc文件,写HFile的MapReduce还是有问题的,在这里只是拿来列出权限问题的解决方式,并不是写HFile的案例代码。问题原因,HFileOutputFormat2.configureIncrementalLoad指定了输出格式为:
首先我们来看看HFileOutputFormat2.configureIncrementalLoad的源码:
HFileOutputFormat2.configureIncrementalLoad(job, table, connection.getRegionLocator(tableNames));
public static void configureIncrementalLoad(Job job, Table table, RegionLocator regionLocator)
throws IOException {
configureIncrementalLoad(job, table.getTableDescriptor(), regionLocator);
}
static void configureIncrementalLoad(Job job, HTableDescriptor tableDescriptor,
RegionLocator regionLocator, Class<? extends OutputFormat<?, ?>> cls) throws IOException,
UnsupportedEncodingException {
Configuration conf = job.getConfiguration();
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(KeyValue.class);
job.setOutputFormatClass(cls);
// Based on the configured map output class, set the correct reducer to properly
// sort the incoming values.
// TODO it would be nice to pick one or the other of these formats.
if (KeyValue.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(KeyValueSortReducer.class);
} else if (Put.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(PutSortReducer.class);
} else if (Text.class.equals(job.getMapOutputValueClass())) {
job.setReducerClass(TextSortReducer.class);
} else {
LOG.warn("Unknown map output value type:" + job.getMapOutputValueClass());
}
conf.setStrings("io.serializations", conf.get("io.serializations"),
MutationSerialization.class.getName(), ResultSerialization.class.getName(),
KeyValueSerialization.class.getName());
// Use table's region boundaries for TOP split points.
LOG.info("Looking up current regions for table " + tableDescriptor.getTableName());
List<ImmutableBytesWritable> startKeys = getRegionStartKeys(regionLocator);
LOG.info("Configuring " + startKeys.size() + " reduce partitions " +
"to match current region count");
job.setNumReduceTasks(startKeys.size());
configurePartitioner(job, startKeys);
// Set compression algorithms based on column families
configureCompression(conf, tableDescriptor);
configureBloomType(tableDescriptor, conf);
configureBlockSize(tableDescriptor, conf);
configureDataBlockEncoding(tableDescriptor, conf);
TableMapReduceUtil.addDependencyJars(job);
TableMapReduceUtil.initCredentials(job);
LOG.info("Incremental table " + regionLocator.getName() + " output configured.");
}
在这里configureIncrementalLoad已经指定了输出方式:
Configuration conf = job.getConfiguration();
job.setOutputKeyClass(ImmutableBytesWritable.class);
job.setOutputValueClass(KeyValue.class);
注:
/** hadoop中的job.setOutputKeyClass(theClass)与job.setOutputValueClass(theClass),
但是有的程序处理以上两个外还有job.setMapOutputKeyClass(theClass)与job.setMapOutputValueClass(Text.class),
一直没弄懂是怎么回事,网上查了下,原来当mapper与reducer的输出类型一致时可以用job.setOutputKeyClass(theClass)
与job.setOutputValueClass(theClass)这两个进行配置就行,但是当mapper用于reducer两个的输出类型不一致的时候就需
要分别进行配置了。
HFileOutputFormat2.configureIncrementalLoad源码中有设置job.setOutputKeyClass(theClass)job.setOutputValueClass(theClass)
分别为ImmutableBytesWritable 、Put(KeyValue)
这就导致了要么设置Map 和 Reduce 的输出都为ImmutableBytesWritable 、Put(KeyValue)
要么只有一个Map 并且设置输出为ImmutableBytesWritable 、Put(KeyValue)
这就是本代码运行时失败的原因:
*/
所以在我们的Main函数里,设置了Map的输出时的分组(保证每个用户数据在同一个reduce下)和排序(保证每个reduce下的每个用户数据是按时间排序的)。我们的Main函数设置如下:
job.setPartitionerClass(xxxpartition.class);
job.setMapOutputKeyClass(xxxTimeWritable.class);
job.setMapOutputValueClass(Text.class);
所以,我在运行的时候还会遇到下面的错误提示:
Error: java.lang.IllegalArgumentException: Can't read partitions file
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:116)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133)
at org.apache.hadoop.mapred.MapTask$NewOutputCollector.<init>(MapTask.java:707)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:776)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
Caused by: java.io.IOException: wrong key class: com.xx.xx.hivetable.xxx.usepartition.xx.xxxTimeWritable is not class org.apache.hadoop.hbase.io.ImmutableBytesWritable
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2339)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2391)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.readPartitions(TotalOrderPartitioner.java:306)
at org.apache.hadoop.mapreduce.lib.partition.TotalOrderPartitioner.setConf(TotalOrderPartitioner.java:88)
... 10 more