1
2 3 4 |
public
interface Configurable
{
void setConf (Configuration conf ) ; Configuration getConf ( ) ; } |
Configurable接口只定义了两个方法:setConf与 getConf。
Configured类实现了Configurable接口:
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
public
class Configured
implements Configurable
{
private Configuration conf ; public Configured ( ) { this ( null ) ; } public Configured (Configuration conf ) { setConf (conf ) ; } public void setConf (Configuration conf ) { this. conf = conf ; } public Configuration getConf ( ) { return conf ; } } |
Tool接口继承了Configurable接口,只有一个run()方法。(接口继承接口)
继承关系如下:
再看ToolRunner类的一部分:
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
public
class ToolRunner
{
public static int run (Configuration conf, Tool tool, String [ ] args ) throws Exception { if (conf == null ) { conf = new Configuration ( ) ; } GenericOptionsParser parser = new GenericOptionsParser (conf, args ) ; //set the configuration back, so that Tool can configure itself tool. setConf (conf ) ; //get the args w/o generic hadoop args String [ ] toolArgs = parser. getRemainingArgs ( ) ; return tool. run (toolArgs ) ; } } |
从ToolRunner的静态方法run()可以看到,其通过GenericOptionsParser 来读取传递给run的job的conf和命令行参数args,处理hadoop的通用命令行参数,然后将剩下的job自己定义的参数(toolArgs = parser.getRemainingArgs();)交给tool来处理,再由tool来运行自己的run方法。
通用命令行参数指的是对任意的一个job都可以添加的,如:
-conf < configuration file > specify a configuration file -D < property=value > use value for given property -fs < local|namenode:port > specify a namenode -jt < local|jobtracker:port > specify a job tracker -files < comma separated list of files > specify comma separated files to be copied to the map reduce cluster -libjars < comma separated list of jars > specify comma separated jar files to include in the classpath. -archives < comma separated list of archives > specify comma separated archives to be unarchived on the compute machines.
一个典型的实现Tool的程序:
1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 |
/** MyApp 需要从命令行读取参数,用户输入命令如, $bin/hadoop jar MyApp.jar -archives test.tgz arg1 arg2 -archives 为hadoop通用参数,arg1 ,arg2为job的参数 */ public class MyApp extends Configured implements Tool { //implemet Tool’s run public int run ( String [ ] args ) throws Exception { Configuration conf = getConf ( ) ; // Create a JobConf using the processed conf JobConf job = new JobConf (conf, MyApp. class ) ; // Process custom command-line options Path in = new Path (args [ 1 ] ) ; Path out = new Path (args [ 2 ] ) ; // Specify various job-specific parameters job. setJobName ( "my-app" ) ; job. setInputPath (in ) ; job. setOutputPath (out ) ; job. setMapperClass (MyApp. MyMapper. class ) ; job. setReducerClass (MyApp. MyReducer. class ) ; JobClient. runJob (job ) ; } public static void main ( String [ ] args ) throws Exception { // args由ToolRunner来处理 int res = ToolRunner. run ( new Configuration ( ), new MyApp ( ), args ) ; System. exit (res ) ; } } |