1.从多个hive表读取数据
使用HCatMultiInputFormat,inputJobInfoList添加多个表的信息,如
inputJobInfoList.add(InputJobInfo.create(dbA, tableA, null, null));
inputJobInfoList.add(InputJobInfo.create(dbB, tableB, null, null));
HCatMultiInputFormat.setInput(job, inputJobInfoList);
2.调用HCatInputFormat.getTableSchema时报错job information not found in JobContext. HCatInputFormat.setInput() not called?
代码中已经调用了setInput,看了下代码是因为conf.get("mapreduce.lib.hcatoutput.info")是null导致,再查看setInput中里面调用的是
conf.set("mapreduce.lib.hcatmulti.jobs.info"),仔细看了发现因为要读2个表从而使用了HCatMultiInputFormat而并不是用的HCatInputFormat,所以获取表结构相应的也要使用HCatMultiInputFormat.getTableSchema3.报错:getDelegationToken() can be called only in thrift (non local) mode
hive.metastore.local设置为false
并设置hive.metastore.uris(比如thrift://10.1.8.42:9083)
详见:http://blog.youkuaiyun.com/lalaguozhe/article/details/9083905
4.报错:2004 HCatOutputFormat not initialized, setOutput has to be called
有两种办法:
1.改成使用deprecated的版本:getTableSchema(JobContext context),不要用getTableSchema(Configuration conf)版本
2.用getTableSchema(Configuration conf)版本,但不直接传入新new的Configuration对象,而是传入job.getConfiguration()
注意:这边调用getTableSchema是为了获取到表的schema然后再setSchema,如果不调用setSchema()会报错:9001 : Exception occurred while processing HCat request : It seems that setSchema() is not called on HCatOutputFormat. Please make sure that method is called.
5.报错Error: org.apache.hcatalog.common.HCatException : 2010 : Invalid partition values specified : Unable to configure dynamic partitioning for storage handler, mismatch between number of partition values obtained[0] and number of partition values required[1]
官方Wiki:https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput