Sqoop1: Sqoop-HCatalog Integration

本文详细记录了使用Sqoop导入数据到HCatalog过程中遇到的各种NoClassDefFoundError错误及解决方法,并最终通过调整配置成功实现数据导入。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

I create a HCat table using Hue metastore manager, and submit a sqoop job with hcat through hue, the command show as blow:

import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj 
--password 123456 --table sample_user --split-by user_id -m 2 
--hcatalog-table sample_raw.sample_user

 Error:

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hcatalog/mapreduce/HCatOutputFormat
java.lang.NoClassDefFoundError: org/apache/hcatalog/mapreduce/HCatOutputFormat
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, org/apache/hadoop/hive/conf/HiveConf$ConfVars
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/conf/HiveConf$ConfVars

 

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], 
main() threw exception, java.lang.NoClassDefFoundError: javax/jdo/JDOException
com.google.common.util.concurrent.ExecutionError: 
java.lang.NoClassDefFoundError: javax/jdo/JDOException
	at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2232)
....
Caused by: java.lang.ClassNotFoundException: javax.jdo.JDOException

similars :
https://issues.apache.org/jira/browse/HCATALOG-380
https://issues.apache.org/jira/browse/PIG-2666

 

java.lang.ClassNotFoundException: org.datanucleus.api.jdo.JDOPersistenceManagerFactory
	at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1168)

 

 

 

A:

a. copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog  to  oozie-4.0.1/share/lib/sqoop

this will solve the first error,but the second error comes after the first one disapare.

 

I think two ways to fix the second one.

1)change hadoop-env.sh  adding HADOOP_CLASSPATH pointing to hcatalog jars which are copyed to cluster nodes from local dir oozie-4.0.1/share/lib/hcatalog.  But fails.

2.copy all jars in share/lib/hcatalog to share/lib/sqoop and upgrade it to hdfs sharelib. This way solve the second error but lead to the third one. Fuck!

3. copy all jars in shar/lib/hive/ to share/lib/sqoop/  and upgrade sharelib

//till now, the first three errors solved, but produce the fourth one

4. The fourth error is due to version problem

"datanucleus-api-jdo-3.0.0-release.jar" does NOT contain  
 org.datanucleus.jdo.JDOPersistenceManagerFactory. 
It contains  "org.datanucleus.api.jdo.JDOPersistenceManagerFactory". 

I find that in oozie-4.0.1

./share/lib/sqoop/datanucleus-rdbms-2.0.3.jar
./share/lib/sqoop/datanucleus-connectionpool-2.0.3.jar
./share/lib/sqoop/datanucleus-core-2.0.3.jar
./share/lib/sqoop/datanucleus-enhancer-2.0.3.jar
./share/lib/hive/datanucleus-rdbms-2.0.3.jar
./share/lib/hive/datanucleus-connectionpool-2.0.3.jar
./share/lib/hive/datanucleus-core-2.0.3.jar
./share/lib/hive/datanucleus-enhancer-2.0.3.jar

and  in hive-0.12.0-bin

./lib/datanucleus-core-3.2.2.jar
./lib/datanucleus-rdbms-3.2.1.jar
./lib/datanucleus-api-jdo-3.2.1.ja

 

cp datanucleus-core-3.2.2.jar datanucleus-rdbms-3.2.1.jar datanucleus-api-jdo-3.2.1.ja to share/lib/sqoop

and upgrade sharelib.

 

Note: when compile oozie, the hive version is not 0.12.0, so lead to these errors.

 

 

http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

 

******************************************************************************

0.  retain the original jars in share/lib/sqoop

1.  copy all jars in hive-0.12.0-bin/hcatalog/share/hcatalog  to  oozie-4.0.1/share/lib/sqoop

2.  copy all jars in hive-0.12.0-bin/lib/  to oozie-4.0.1/share/lib/sqoop

3. copy sqoop-1.4.4.bin__hadoop-2.0.4-alpha/sqoop-1.4.4.jar  to oozie-4.0.1/share/lib/sqoop

4. update sharelib

All errors above listed disappers. But the new one comes

 

 java.io.IOException: NoSuchObjectException(message:inok_datamine.inok_user table not found)

 I geuss sqoop job submmited through hue couldn't access hive metastore. But can't fix till now.

 

Good news: I set the entity in hive-site.xml

  <name>hive.metastore.uris</name>
  <value>thrift://192.168.122.1:9083</value>

and upload it to hdfs  hive/hive-site.xml meanwhile add it to the sqoop job in hue.

 

    start metastore by

hive --service metastore  //default port is 9083
hive --service metastore -p <port_num>

But the last error maybe comes

32064 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog full table schema fields = [user_id, user_name, first_letter, live_city]
33238 [main] INFO  org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities  - HCatalog table partitioning key fields = []
33241 [main] ERROR org.apache.sqoop.Sqoop  - Got exception running Sqoop: java.lang.NullPointerException
Intercepting System.exit(1)
 details
java.lang.NullPointerException
	at org.apache.hcatalog.data.schema.HCatSchema.get(HCatSchema.java:99)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureHCat(SqoopHCatUtilities.java:344)
	at org.apache.sqoop.mapreduce.hcat.SqoopHCatUtilities.configureImportOutputFormat(SqoopHCatUtilities.java:658)
	at org.apache.sqoop.mapreduce.ImportJobBase.configureOutputFormat(ImportJobBase.java:98)
	at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:232)
	at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:600)
	at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:118)
	at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:413)
	at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:502)
	at org.apache.sqoop.Sqoop.run(Sqoop.java:145)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
	at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:181)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:220)
	at org.apache.sqoop.Sqoop.runTool(Sqoop.java:229)
	at org.apache.sqoop.Sqoop.main(Sqoop.java:238)
	at org.apache.oozie.action.hadoop.SqoopMain.runSqoopJob(SqoopMain.java:203)
	at org.apache.oozie.action.hadoop.SqoopMain.run(SqoopMain.java:172)
	at org.apache.oozie.action.hadoop.LauncherMain.run(LauncherMain.java:37)
	at org.apache.oozie.action.hadoop.SqoopMain.main(SqoopMain.java:45)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:226)
	at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
	at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:168)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:415)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:163)
Intercepting System.exit(1)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.SqoopMain], exit code [1]
 
 After two days struggle, I finally realized that  the hive metastore is the same with hcatalog in hive 0.12. So use --hive-XXX to replace --hcatalog-xxxx. The following command is correct.
    import --connect jdbc:mysql://192.168.122.1:3306/sample --username zhj   
    --password 123456 --table sample_user --split-by user_id -m 2   
    --hive-database sample_raw --hive-table sample_user --hive-import
 

 

**********************************************************************************

The entity in hive-site.xml shown below  is confused me.

<!-- HCatAccessorService -->
   <property>
        <name>oozie.service.HCatAccessorService.jmsconnections</name>
        <value>
        default=java.naming.factory.initial#org.apache.activemq.jndi.ActiveMQInitialContextFactory;java.naming.provider.url#tcp://localhost:61616;connectionFactoryNames#ConnectionFactory
        </value>
        <description>
        Specify the map  of endpoints to JMS configuration properties. In general, endpoint
        identifies the HCatalog server URL. "default" is used if no endpoint is mentioned
        in the query. If some JMS property is not defined, the system will use the property
        defined jndi.properties. jndi.properties files is retrieved from the application classpath.
        Mapping rules can also be provided for mapping Hcatalog servers to corresponding JMS providers.
        hcat://${1}.${2}.server.com:8020=java.naming.factory.initial#Dummy.Factory;java.naming.provider.url#tcp://broker.${2}:61616
        </description>
   </property>

 

 

 

 

see:http://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-foundhttp://stackoverflow.com/questions/11494267/class-org-datanucleus-jdo-jdopersistencemanagerfactory-was-not-found

 

The similar problem with pig using hcatalog appears. see:http://ylzhj02.iteye.com/admin/blogs/2043781

 

 

NOTE:With the support for HCatalog added to Sqoop, any HCatalog job depends on a set of jar files being available both on the Sqoop client host and where the Map/Reduce tasks run. To run HCatalog jobs, the environment variable HADOOP_CLASSPATH must be set up as shown below before launching the Sqoop HCatalog jobs.

HADOOP_CLASSPATH=$(hcat -classpath)

export HADOOP_CLASSPATH

The necessary HCatalog dependencies will be copied to the distributed cache automatically by the Sqoop job.

I  add the above two lines to ~/.bashrc  and hive-0.12.0-bin/conf/hive-env.sh, but not works.

 -----------------

 NoSuchObjectException(message:default.'inok_datamine.inok_user' table not found)

    my sqoop script command likes:

--hcatalog-table 'inok_datamine.inok_user'

    the above script miss  --hcatalog-database. Correct scipt is:

 

Reference

official docments

http://gethue.com/hadoop-tutorial-how-to-access-hive-in-pig-with/

https://cwiki.apache.org/confluence/display/Hive/HCatalog

http://blog.cloudera.com/blog/2011/01/how-to-include-third-party-libraries-in-your-map-reduce-job/

 http://www.micmiu.com/bigdata/sqoop/sqoop-setup-and-demo/

 

 

 

 

<think>我们正在使用Sqoop连接MySQL并列出数据库。用户提供的命令是:sqooplist-databasesconnectjdbc:mysqlusernamerootpasswordprompt根据引用[1]和引用[2]中的语法,正确的Sqoop命令应该如下:引用[1]中给出:$sqooplist-databases(generic-args)(list-databases-args)引用[2]中给出:$sqooplist-tables--connectjdbc:mysql://database.example.com/corp...等,说明连接参数需要使用--connect指定。因此,用户提供的命令中缺少必要的参数标志(如--connect,--username,--password),并且密码使用prompt表示需要交互式输入。正确的命令应该是:sqooplist-databases--connectjdbc:mysql://<hostname>:<port>/--usernameroot--password注意:1.<hostname>和<port>需要替换为实际的MySQL服务器地址和端口(默认3306)。2.使用--password参数而不直接写密码,这样Sqoop会提示输入密码,避免在命令行中暴露密码。3.如果MySQL不在本地,需要指定正确的IP或域名。根据用户需求,我们构建命令如下(假设MySQL在本地,端口3306):sqooplist-databases--connectjdbc:mysql://localhost:3306/--usernameroot--password执行此命令后,Sqoop会提示输入密码,输入正确的MySQLroot密码即可。另外,引用[3]是关于Hive导入数据的,与当前问题无关,可以忽略。因此,我们给用户的建议是使用上述命令,并替换正确的连接地址。同时,生成相关问题:1.如何指定Sqoop连接的MySQL端口?2.如果MySQL部署在远程服务器,如何配置网络访问?3.Sqoop除了列出数据库,还能列出指定数据库中的表吗?如何操作?</think>### Sqoop连接MySQL并列出数据库 要使用Sqoop连接MySQL并列出数据库,请按以下步骤操作: #### 1. **完整命令格式** ```bash sqoop list-databases \ --connect jdbc:mysql://<主机名>:<端口>/ \ --username <用户名> \ --password ``` - `--connect`:指定JDBC连接字符串,格式为`jdbc:mysql://服务器地址:端口/` - `--username`:数据库用户名(如`root`) - `--password`:使用此参数时Sqoop会**交互式提示输入密码**(比明文更安全) #### 2. **实际示例(本地MySQL)** ```bash sqoop list-databases \ --connect jdbc:mysql://localhost:3306/ \ --username root \ --password ``` 执行后将提示输入密码: ```plaintext Enter password: ****** ``` #### 3. **关键说明** - **默认端口**:MySQL默认端口是`3306`,若使用默认端口可省略`:3306` - **结果输出**:成功连接后显示MySQL服务器上的所有数据库名,例如: ```plaintext information_schema mysql performance_schema sys your_database ``` - **安全提示**:避免在命令中直接写密码(如`--password 123`),这可能导致密码泄露[^1] #### 4. **常见错误处理** | 错误现象 | 原因 | 解决方案 | |---------|------|---------| | `Connection refused` | MySQL服务未启动 | 启动MySQL服务:`sudo systemctl start mysqld` | | `Access denied` | 用户名/密码错误 | 检查MySQL权限:`mysql -u root -p` | | `No suitable driver` | JDBC驱动缺失 | 下载[MySQL JDBC驱动](https://dev.mysql.com/downloads/connector/j/) 并放入`$SQOOP_HOME/lib` | #### 5. **扩展应用** - 列出指定数据库的表(参考引用[2]): ```bash sqoop list-tables \ --connect jdbc:mysql://localhost:3306/your_database \ --username root \ --password ``` > **注意**:确保Sqoop已正确安装且`$SQOOP_HOME/bin`在系统PATH中[^1][^2]。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值