Sqoop1: Sqoop-HCatalog Integration

本文详细记录了使用Sqoop导入数据到HCatalog过程中遇到的各种NoClassDefFoundError错误及解决方法,并最终通过调整配置成功实现数据导入。

I create a HCat table using Hue metastore manager, and submit a sqoop job with hcat through hue, the command show as blow:

import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj 
--password 123456 --table sample_user --split-by user_id -m 2 
--hcatalog-table sample_raw.sample_user

 Error:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hcatalog/mapreduce/HCatOutputFormat
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hadoop/hive/conf/HiveConf$ConfVars
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf$ConfVars

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, java.lang.NoClassDefFoundError: javax/jdo/JDOException
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoClassDefFoundError: javax/jdo/JDOException
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
....
Caused by: java.lang.ClassNotFoundException: javax.jdo.JDOException

similars :
https://issues.apache.org/jira/browse/HCATALOG-380
https://issues.apache.org/jira/browse/PIG-2666

 

java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1168)

 

 

 

A:

a. copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog  to  oozie-4.0.1/share/lib/sqoop

this will solve the first error,but the second error comes after the first one disapare.

 

I think two ways to fix the second one.

1)change hadoop-env.sh  adding HADOOP_CLASSPATH pointing to hcatalog jars which are copyed to cluster nodes from local dir oozie-4.0.1/share/lib/hcatalog.  But fails.

2.copy all jars in share/lib/hcatalog to share/lib/sqoop and upgrade it to hdfs sharelib. This way solve the second error but lead to the third one. Fuck!

3. copy all jars in shar/lib/hive/ to share/lib/sqoop/  and upgrade sharelib

//till now, the first three errors solved, but produce the fourth one

4. The fourth error is due to version problem

"datanucleus-api-jdo-3.0.0-release.jar" does NOT contain  
 org.datanucleus.jdo.JDOPersistenceManagerFactory. 
It contains  "org.datanucleus.api.jdo.JDOPersistenceManagerFactory". 

I find that in oozie-4.0.1

./share/lib/sqoop/datanucleus-rdbms-2.0.3.jar
./share/lib/sqoop/datanucleus-connectionpool-2.0.3.jar
./share/lib/sqoop/datanucleus-core-2.0.3.jar
./share/lib/sqoop/datanucleus-enhancer-2.0.3.jar
./share/lib/hive/datanucleus-rdbms-2.0.3.jar
./share/lib/hive/datanucleus-connectionpool-2.0.3.jar
./share/lib/hive/datanucleus-core-2.0.3.jar
./share/lib/hive/datanucleus-enhancer-2.0.3.jar

and  in hive-0.12.0-bin

./lib/datanucleus-core-3.2.2.jar
./lib/datanucleus-rdbms-3.2.1.jar
./lib/datanucleus-api-jdo-3.2.1.ja

 

cp datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar datanucleus-api-jdo-3.2.1.ja to share/lib/sqoop

and upgrade sharelib.

 

Note: when compile oozie, the hive version is not 0.12.0, so lead to these errors.

 

 

http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

 

******************************************************************************

0.  retain the original jars in share/lib/sqoop

1.  copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog  to  oozie-4.0.1/share/lib/sqoop

2.  copy all jars in hive-0.12.0-bin/lib/  to oozie-4.0.1/share/lib/sqoop

3. copy sqoop-1.4.4.bin__hadoop-2.0.4-alpha/sqoop-1.4.4.jar  to oozie-4.0.1/share/lib/sqoop

4. update sharelib

All errors above listed disappers. But the new one comes

 

 java.io.IOException: NoSuchObjectException(message:inok_datamine.inok_user table not found)

 I geuss sqoop job submmited through hue couldn't access hive metastore. But can't fix till now.

 

Good news: I set the entity in hive-site.xml

  <name>hive.metastore.uris</name>
  <value>thrift://192.168.122.1:9083</value>

and upload it to hdfs  hive/hive-site.xml meanwhile add it to the sqoop job in hue.

 

    start metastore by

hive --service metastore  //default port is 9083
hive --service metastore -p <port_num>

But the last error maybe comes

32064 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog full table schema fields = [user_id, user_name, first_letter, live_city]
33238 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog table partitioning key fields = []
33241 [main] ERROR org.apache.sqoop.Sqoop  - Got exception running Sqoop: java.lang.NullPointerException
Intercepting System.exit(1)
 details
java.lang.NullPointerException
	at org.apache.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:99)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:344)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:658)
	at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:232)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:600)
	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
	at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:203)
	at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:172)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
	at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
 
 After two days struggle, I finally realized that  the hive metastore is the same with hcatalog in hive 0.12. So use --hive-XXX to replace --hcatalog-xxxx. The following command is correct.
    import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj   
    --password 123456 --table sample_user --split-by user_id -m 2   
    --hive-database sample_raw --hive-table sample_user --hive-import
 

 

**********************************************************************************

The entity in hive-site.xml shown below  is confused me.

<!-- HCatAccessorService -->
   <property>
        <name>oozie.service.HCatAccessorService.jmsconnections</name>
        <value>
        default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory
        </value>
        <description>
        Specify the map  of endpoints to JMS configuration properties. In general, endpoint
        identifies the HCatalog server URL. "default" is used if no endpoint is mentioned
        in the query. If some JMS property is not defined, the system will use the property
        defined jndi.properties. jndi.properties files is retrieved from the application classpath.
        Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers.
        hcat://${1}.${2}.server.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}:61616
        </description>
   </property>

 

 

 

 

see:http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-foundhttp://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

 

The similar problem with pig using hcatalog appears. see:http://ylzhj02.iteye.com/admin/blogs/2043781

 

 

NOTE:With the support for HCatalog added to Sqoop, any HCatalog job depends on a set of jar files being available both on the Sqoop client host and where the Map/Reduce tasks run. To run HCatalog jobs, the environment variable HADOOP_CLASSPATH must be set up as shown below before launching the Sqoop HCatalog jobs.

HADOOP_CLASSPATH=$(hcat -classpath)

export HADOOP_CLASSPATH

The necessary HCatalog dependencies will be copied to the distributed cache automatically by the Sqoop job.

I  add the above two lines to ~/.bashrc  and hive-0.12.0-bin/conf/hive-env.sh, but not works.

 -----------------

 NoSuchObjectException(message:default.'inok_datamine.inok_user' table not found)

    my sqoop script command likes:

--hcatalog-table 'inok_datamine.inok_user'

    the above script miss  --hcatalog-database. Correct scipt is:

 

Reference

official docments

http://gethue.com/hadoop-tutorial-how-to-access-hive-in-pig-with/

https://cwiki.apache.org/confluence/display/Hive/HCatalog

http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

 http://www.micmiu.com/bigdata/sqoop/sqoop-setup-and-demo/

 

 

 

 

在使用 Sqoop 执行 `list-databases` 命令时,如果遇到 **“Unrecognized argument: --passward”** 错误以及 **HBase/HCatalog/Accumulo/Zookeeper 路径不存在警告**,这通常与命令参数拼写错误、Sqoop 配置路径缺失或依赖库版本不兼容有关。以下是详细的解决方法: ### 参数错误:Unrecognized argument: --passward 该错误是由于用户输入了错误的参数名称导致的。正确的参数应为 `--password`,而不是 `--passward`。 例如,以下命令会导致报错: ```bash bin/sqoop list-databases --connect jdbc:mysql://rds:3306/ --username root --passward 123456 ``` 正确用法应为: ```bash bin/sqoop list-databases --connect jdbc:mysql://rds:3306/ --username root --password 123456 ``` 确保参数拼写无误可以避免此类问题[^1]。 ### HBase/HCatalog/Accumulo/Zookeeper 路径不存在警告 这类警告信息如下: ``` Warning: /path/to/hbase does not exist. HBase imports will fail. Warning: /path/to/hcatalog does not exist. HCatalog jobs will fail. Warning: /path/to/accumulo does not exist. Accumulo imports will fail. Warning: /path/to/zookeeper does not exist. Zookeeper configuration cannot be found. ``` 这些警告表明 Sqoop 在启动时尝试加载 HBase、HCatalog、Accumulo 和 Zookeeper 等组件的配置路径,但指定的路径不存在。虽然不影响基本功能(如 MySQL 数据抽取),但会影响涉及这些组件的任务执行。 #### 解决方案: 1. **忽略警告(适用于不使用相关组件)** 如果当前任务仅涉及 MySQL 到 HDFS 的导入导出,且不涉及 HBase、HCatalog 等系统,则可以安全忽略这些警告。 2. **配置正确的路径(适用于需要集成其他组件)** 修改 `$SQOOP_HOME/conf/sqoop-env.sh` 文件,设置正确的 Hadoop、HBase、HCatalog 等路径。示例如下: ```bash export HADOOP_HOME=/path/to/hadoop export HBASE_HOME=/path/to/hbase export HCAT_HOME=/path/to/hcatalog export ZOOKEEPER_HOME=/path/to/zookeeper ``` 确保上述路径指向实际安装目录,并具有读取权限。 3. **检查环境变量是否生效** 在终端中执行 `echo $HADOOP_HOME`、`echo $HBASE_HOME` 等命令,确认环境变量已正确配置。 4. **使用默认 Hadoop 类路径(可选)** 如果不需要集成其他系统,可以通过以下方式禁用相关路径加载: ```bash export HADOOP_MAPRED_HOME=$HADOOP_HOME ``` ### 示例修正后的 Sqoop 命令 ```bash bin/sqoop list-databases \ --connect jdbc:mysql://rds:3306/ \ --username root \ --password 123456 ``` 同时确保 MySQL JDBC 驱动(如 `mysql-connector-java-8.0.x.jar`)已放入 `$SQOOP_HOME/lib` 目录下,以避免驱动类加载异常。 ---
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值