HCatalog问题记录

最新推荐文章于 2025-11-04 12:08:54 发布

原创最新推荐文章于 2025-11-04 12:08:54 发布 · 2.4k 阅读

0 ·

CC 4.0 BY-SA版权

Hadoop 专栏收录该内容

7 篇文章

订阅专栏

本文记录了在使用HCatalog过程中遇到的两个主要问题及解决方案。首先，从多个Hive表读取数据时，通过HCatMultiInputFormat的InputJobInfo设置多个表的信息。其次，调用HCatInputFormat.getTableSchema时遇到错误，原因是job information未找到。解决方法包括使用已弃用的方法或确保使用正确的Configuration对象。此外，还提到了一个关于动态分区配置错误的问题，提示所需分区值数量与实际不符。

1.从多个hive表读取数据

使用HCatMultiInputFormat，inputJobInfoList添加多个表的信息，如

inputJobInfoList.add(InputJobInfo.create(dbA, tableA, null, null));
inputJobInfoList.add(InputJobInfo.create(dbB, tableB, null, null));
HCatMultiInputFormat.setInput(job, inputJobInfoList);

2.调用HCatInputFormat.getTableSchema时报错job information not found in JobContext. HCatInputFormat.setInput() not called?

代码中已经调用了setInput，看了下代码是因为conf.get("mapreduce.lib.hcatoutput.info")是null导致，再查看setInput中里面调用的是

conf.set("mapreduce.lib.hcatmulti.jobs.info")，仔细看了发现因为要读2个表从而使用了HCatMultiInputFormat而并不是用的HCatInputFormat，所以获取表结构相应的也要使用HCatMultiInputFormat.getTableSchema

3.报错：getDelegationToken() can be called only in thrift (non local) mode
hive.metastore.local设置为false
并设置hive.metastore.uris（比如thrift://10.1.8.42:9083）
详见：http://blog.youkuaiyun.com/lalaguozhe/article/details/9083905

4.报错：2004 HCatOutputFormat not initialized, setOutput has to be called

有两种办法：

1.改成使用deprecated的版本：getTableSchema(JobContext context)，不要用getTableSchema(Configuration conf)版本

2.用getTableSchema(Configuration conf)版本，但不直接传入新new的Configuration对象，而是传入job.getConfiguration()

注意：这边调用getTableSchema是为了获取到表的schema然后再setSchema，如果不调用setSchema()会报错：9001 : Exception occurred while processing HCat request : It seems that setSchema() is not called on HCatOutputFormat. Please make sure that method is called.

5.报错Error: org.apache.hcatalog.common.HCatException : 2010 : Invalid partition values specified : Unable to configure dynamic partitioning for storage handler, mismatch between number of partition values obtained[0] and number of partition values required[1]

A current code example for writing out a specific partition for (a=1,b=1) would go something like this:

Map<String, String> partitionValues = new HashMap<String, String>();

partitionValues.put("a", "1");

partitionValues.put("b", "1");

HCatTableInfo info = HCatTableInfo.getOutputTableInfo(dbName, tblName, partitionValues);

HCatOutputFormat.setOutput(job, info);

官方Wiki：https://cwiki.apache.org/confluence/display/Hive/HCatalog+InputOutput