楼主是一个hadoop初学者,本着helloword的精神来写mapreduce,将一个表的数据写到另一个表。其中踩过的坑啊......为了不踩同样的坑,楼主将其中的问题及解决办法记录下来。
以下是楼主的代码:
package handler;
import mapper.DBInputMapper;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.db.DBConfiguration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import bean.User;
@SuppressWarnings("all")
public class Main implements Tool{
private Configuration conf;
public static void main(String[] args) throws Exception {
int run = ToolRunner.run(new Main(), args);
System.exit(run);
}
@Override
public Configuration getConf() {
return this.conf;
}
@Override
public void setConf(Configuration conf) {
this.conf = conf;
}
@Override
public int run(String[] args) throws Exception {
Configuration conf = this.getConf();
DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver","jdbc:mysql://***:3306/cbh", "***", "***");
Job job = Job.getInstance(conf);
job.setJarByClass(Main.class);
job.setMapperClass(DBInputMapper.class);
job.setMapOutputKeyClass(User.class);
job.setMapOutputValueClass(NullWritable.class);
job.setOutputKeyClass(User.class);
job.setOutputValueClass(NullWritable.class);
job.setInputFormatClass(DBInputFormat.class);
job.setOutputFormatClass(DBOutputFormat.class);
// 列名
String[] fields = { "id", "phone" };
// 六个参数分别为:
// 1.Job;2.Class<? extends DBWritable>
// 3.表名;4.where条件
// 5.order by语句;6.列名
DBInputFormat.setInput(job, User.class, "t_cbh_user", null, "id",
fields);
DBOutputFormat.setOutput(job, "t_user_hadoop", fields);
// FileOutputFormat.setOutputPath(job, new
// Path("/test/mysql2Mysql"));
return job.waitForCompletion(true) ? 0 : 1;
}
}
1、代码在本地正常,提交到集群提示找不到mysql的驱动类。但是此时任务还没执行,还处于提交资源阶段。楼主怀疑是hadoop库里面没有mysql的驱动包。于是楼主在hadoop-env.sh添加以下代码:
`
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/jars/*
`
/jars下是楼主用到的第三方jar包,有mysql的驱动包。再次运行,尼玛,依然是同样的错误,but坑换位置了。再次分析下,发现此次报错时,任务已经执行,楼主顿时茅塞顿开,肯定是任务在yarn框架上运行抛的错。于是楼主加了下面一行代码:
job.addArchiveToClassPath(newPath("hdfs://es:9000/jars/mysql/driver/mysql-connector-java-5.1.41.jar"))
再次运行,成功了!
本着条条道路通罗马的精神,楼主尝试了将
job.addArchiveToClassPath(newPath("hdfs://es:9000/jars/mysql/driver/mysql-connector-java-5.1.41.jar"))
替换为
String[] args = new GenericOptionsParser(conf, allArgs).getRemainingArgs();
采用 -libjars 的方式:
hadoop -jar xxxx.jar handler.Main -libjars /jars/mysql-connector-java-5.1.41.jar
执行,程序依然正常。