Using the libjars option with Hadoop

本文详细介绍了如何使用 Hadoop 的 libjars 选项来确保 MapReduce 任务能够访问第三方 JAR 文件。通过三个步骤:正确配置 libjars 选项、确保代码使用 GenericOptionsParser 处理解析选项及设置 HADOOP_CLASSPATH 环境变量,可以有效地将外部依赖项整合到 Hadoop 作业中。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

http://grepalex.com/2013/02/25/hadoop-libjars/

 

 

When working with MapReduce one of the challenges that is encountered early-on is determining how to make your third-part JAR’s available to the map and reduce tasks. One common approach is to create a fat jar, which is a JAR that contains your classes as well as your third-party classes (see this Cloudera blog post for more details).

A more elegant solution is to take advantage of the libjars option in the hadoop jar command, also mentioned in the Cloudera post at a high level. Here I’ll go into detail on the three steps required to make this work.

Add libjars to the options

It can be confusing to know exactly where to put libjars when running the hadoop jar command. The following example shows the correct position of this option:

$ export LIBJARS=/path/jar1,/path/jar2
$ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value

It’s worth noting in the above example that the JAR’s supplied as the value of the libjar option are comma-separated, and not separated by your O.S. path delimiter (which is how a Java classpath is delimited).

You may think that you’re done, but often times this step alone may not be enough - read on for more details!

Make sure your code is using GenericOptionsParser

The Java class that’s being supplied to the hadoop jar command should use the GenericOptionsParser class to parse the options being supplied on the CLI. The easiest way to do that is demonstrated with the following code, which leverages the ToolRunner class to parse-out the options:

public static void main(final String[] args) throws Exception {
  Configuration conf = new Configuration();
  int res = ToolRunner.run(conf, new com.example.MyTool(), args);
  System.exit(res);
}

It is crucial that the configuration object being passed into the ToolRunner.run method is the same one that you’re using when setting-up your job. To guarantee this, your class should use the getConf() method defined in Configurable (and implemented in Configured) to access the configuration:

public class SmallFilesMapReduce extends Configured implements Tool {

  public final int run(final String[] args) throws Exception {
    Job job = new Job(super.getConf());
    ...
    job.waitForCompletion(true);
    return ...;
  }

If you don’t leverage the Configuration object supplied to the ToolRunner.run method in your MapReduce driver code, then your job won’t be correctly configured and your third-party JAR’s won’t be copied to the Distributed Cache or loaded in the remote task JVM’s.

It’s the ToolRunner.run method (actually it delegates the command parsing to GenericOptionsParser) which actually parses-out the libjars argument, and adds to the Configuration object a value for the tmpjar property. So a quick way to make sure that this step is working is to look at the job file for your MapReduce job (there’s a link when viewing the job details from the JobTracker), and make sure that the tmpjar configuration name exists with a value identical to the path that you specified in your command. You can also use the command-line to search for the libjars configuration in HDFS

$ hadoop fs -cat <JOB_OUTPUT_HDFS_DIRECTORY>/_logs/history/*.xml | grep tmpjars

Use HADOOP_CLASSPATH to make your third-party JAR’s available on the client-side

So far the first two steps tackled what you needed to do to to make your third-party JAR’s available to the remote map and reduce task JVM’s. But what hasn’t been covered so far is making these same JAR’s available to the client JVM, which is the JVM that’s created when you run the hadoop jar command.

For this to happen, you should set the HADOOP_CLASSPATH environment variable to contain the O.S. path-delimited list of third-party JAR’s. Let’s extend the commands in the first step above with the addition of setting the HADOOP_CLASSPATH environment variable:

$ export LIBJARS=/path/jar1,/path/jar2
$ export HADOOP_CLASSPATH=/path/jar1:/path/jar2
$ hadoop jar my-example.jar com.example.MyTool -libjars ${LIBJARS} -mytoolopt value

Note that value for HADOOP_CLASSPATH uses a Unix path delimiter of :, so modify accordingly for your platform. And if you don’t like the copy-paste above you could modify that line to substitute the commas for semi-colons:

$ export HADOOP_CLASSPATH=`echo ${LIBJARS} | sed s/,/:/g`
数据从 hive 复制到 mysql 中报错hadoop@t3l-VirtualBox:/usr/local/sqoop$ ./bin/sqoop export --connect jdbc:mysql://localhost:3306/dblab --username root --password '*-+' --table user_action --export-dir '/user/hive/warehouse/dblab.db/user_action' --fields-terminated-by '\t'; Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 错误: 找不到或无法加载主类 org.apache.hadoop.hbase.util.GetJavaProperty 2025-08-08 16:00:57,770 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6 2025-08-08 16:00:57,835 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead. 2025-08-08 16:00:58,042 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset. 2025-08-08 16:00:58,042 INFO tool.CodeGenTool: Beginning code generation Fri Aug 08 16:00:58 CST 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 2025-08-08 16:00:59,101 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_action` AS t LIMIT 1 2025-08-08 16:00:59,171 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_action` AS t LIMIT 1 2025-08-08 16:00:59,210 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop 注: /tmp/sqoop-hadoop/compile/e5381a60ac3daedca9188e7ac7d80584/user_action.java使用或覆盖了已过时的 API。 注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。 2025-08-08 16:01:01,847 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/e5381a60ac3daedca9188e7ac7d80584/user_action.jar 2025-08-08 16:01:01,853 INFO mapreduce.ExportJobBase: Beginning export of user_action 2025-08-08 16:01:01,854 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2025-08-08 16:01:02,095 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar 2025-08-08 16:01:03,097 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2025-08-08 16:01:03,334 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative 2025-08-08 16:01:03,337 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2025-08-08 16:01:03,342 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps 2025-08-08 16:01:03,692 INFO impl.MetricsConfig: loaded properties from hadoop-metrics2.properties 2025-08-08 16:01:03,859 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2025-08-08 16:01:03,859 INFO impl.MetricsSystemImpl: JobTracker metrics system started 2025-08-08 16:01:04,058 INFO input.FileInputFormat: Total input files to process : 1 2025-08-08 16:01:04,065 INFO input.FileInputFormat: Total input files to process : 1 2025-08-08 16:01:04,168 INFO mapreduce.JobSubmitter: number of splits:4 2025-08-08 16:01:04,275 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative 2025-08-08 16:01:04,481 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local1968709802_0001 2025-08-08 16:01:04,485 INFO mapreduce.JobSubmitter: Executing with tokens: [] 2025-08-08 16:01:04,820 INFO mapred.LocalDistributedCacheManager: Creating symlink: /usr/local/hadoop/tmp/mapred/local/1754640064639/libjars <- /usr/local/sqoop/libjars/* 2025-08-08 16:01:04,827 WARN fs.FileUtil: Command 'ln -s /usr/local/hadoop/tmp/mapred/local/1754640064639/libjars /usr/local/sqoop/libjars/*' failed 1 with: ln: 无法创建符号链接'/usr/local/sqoop/libjars/*': 没有那个文件或目录 2025-08-08 16:01:04,827 WARN mapred.LocalDistributedCacheManager: Failed to create symlink: /usr/local/hadoop/tmp/mapred/local/1754640064639/libjars <- /usr/local/sqoop/libjars/* 2025-08-08 16:01:04,827 INFO mapred.LocalDistributedCacheManager: Localized file:/tmp/hadoop/mapred/staging/hadoop1968709802/.staging/job_local1968709802_0001/libjars as file:/usr/local/hadoop/tmp/mapred/local/1754640064639/libjars 2025-08-08 16:01:04,958 INFO mapreduce.Job: The url to track the job: http://localhost:8080/ 2025-08-08 16:01:04,959 INFO mapreduce.Job: Running job: job_local1968709802_0001 2025-08-08 16:01:04,962 INFO mapred.LocalJobRunner: OutputCommitter set in config null 2025-08-08 16:01:04,990 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter 2025-08-08 16:01:05,123 INFO mapred.LocalJobRunner: Waiting for map tasks 2025-08-08 16:01:05,130 INFO mapred.LocalJobRunner: Starting task: attempt_local1968709802_0001_m_000000_0 2025-08-08 16:01:05,245 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2025-08-08 16:01:05,253 INFO mapred.MapTask: Processing split: Paths:/user/hive/warehouse/dblab.db/user_action/000000_0:0+3897281 2025-08-08 16:01:05,259 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file 2025-08-08 16:01:05,259 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start 2025-08-08 16:01:05,259 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length 2025-08-08 16:01:05,277 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false Fri Aug 08 16:01:05 CST 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 2025-08-08 16:01:05,392 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2025-08-08 16:01:05,396 INFO mapred.LocalJobRunner: Starting task: attempt_local1968709802_0001_m_000001_0 2025-08-08 16:01:05,398 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2025-08-08 16:01:05,398 INFO mapred.MapTask: Processing split: Paths:/user/hive/warehouse/dblab.db/user_action/000000_0:3897281+3897281 2025-08-08 16:01:05,422 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false Fri Aug 08 16:01:05 CST 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 2025-08-08 16:01:05,509 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2025-08-08 16:01:05,515 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2025-08-08 16:01:05,523 INFO mapred.LocalJobRunner: Starting task: attempt_local1968709802_0001_m_000002_0 2025-08-08 16:01:05,527 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2025-08-08 16:01:05,544 INFO mapred.MapTask: Processing split: Paths:/user/hive/warehouse/dblab.db/user_action/000000_0:7794562+3897281 2025-08-08 16:01:05,557 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false Fri Aug 08 16:01:05 CST 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 2025-08-08 16:01:05,635 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2025-08-08 16:01:05,648 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2025-08-08 16:01:05,652 INFO mapred.LocalJobRunner: Starting task: attempt_local1968709802_0001_m_000003_0 2025-08-08 16:01:05,655 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ] 2025-08-08 16:01:05,674 INFO mapred.MapTask: Processing split: Paths:/user/hive/warehouse/dblab.db/user_action/000000_0:11691843+3897281 2025-08-08 16:01:05,697 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false Fri Aug 08 16:01:05 CST 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification. 2025-08-08 16:01:05,768 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false 2025-08-08 16:01:05,782 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false 2025-08-08 16:01:05,790 INFO mapred.LocalJobRunner: map task executor complete. 2025-08-08 16:01:05,792 WARN mapred.LocalJobRunner: job_local1968709802_0001 java.lang.Exception: java.io.IOException: java.lang.ClassNotFoundException: user_action at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:492) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:552) Caused by: java.io.IOException: java.lang.ClassNotFoundException: user_action at org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:70) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143) at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:799) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:347) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:271) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: user_action at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:66) ... 10 more 2025-08-08 16:01:05,962 INFO mapreduce.Job: Job job_local1968709802_0001 running in uber mode : false 2025-08-08 16:01:05,965 INFO mapreduce.Job: map 0% reduce 0% 2025-08-08 16:01:05,967 INFO mapreduce.Job: Job job_local1968709802_0001 failed with state FAILED due to: NA 2025-08-08 16:01:05,973 INFO mapreduce.Job: Counters: 0 2025-08-08 16:01:06,018 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead 2025-08-08 16:01:06,022 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 2.6143 seconds (0 bytes/sec) 2025-08-08 16:01:06,036 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead 2025-08-08 16:01:06,036 INFO mapreduce.ExportJobBase: Exported 0 records. 2025-08-08 16:01:06,036 ERROR tool.ExportTool: Error during export: Export job failed!
最新发布
08-09
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值