最近项目中遇到了使用SQOOP数据迁移时,HIVE中数据的行顺序与DB2中行顺序不同,百思不得其解(使用--table模式)直接看SQOOP代码。
进入源码发现了Sqoop的import代码都在org.apache.sqoop.tool.importTool.java下.
实际执行importTable的代码如下.
protected boolean importTable(SqoopOptions options, String tableName,
HiveImport hiveImport) throws IOException, ImportException {
String jarFile = null;
// Generate the ORM code for the tables.
jarFile = codeGenerator.generateORM(options, tableName);
Path outputPath = getOutputPath(options, tableName);
// Do the actual import.
ImportJobContext context = new ImportJobContext(tableName, jarFile,
options, outputPath);
// If we're doing an incremental import, set up the
// filtering conditions used to get the latest records.
if (!initIncrementalConstraints(options, context)) {
return false;
}
if (options.isDeleteMode()) {
deleteTargetDir(context);
}
if (null != tableName) {
manager.importTable(context);
} else {
manager.importQuery(context);
}
if (options.isAppendMode()) {
AppendUtils app = new AppendUtils(context);
app.append();
} else if (options.getIncrementalMode() == SqoopOptions.IncrementalMode.DateLastModified) {
lastModifiedMerge(options, context);
}
// If the user wants this table to be in Hive, perform that post-load.
if (options.doHiveImport()) {
// For Parquet file, the import action will create hive table directly via
// kite. So there is no need to do hive import as a post step again.
if (options.getFileLayout() != SqoopOptions.FileLayout.ParquetFile) {
hiveImport.importTable(tableName, options.getHiveTableName(), false);
}
}
saveIncrementalState(options);
return true;
}
逻辑:1.生成用于orm的java类文件这个类文件时通过classWriter生成的 ,用于对应每一列的类型与每一列的名字(这里不详细深究,问题没有出在这)的 read和write方法。这里可以对数据进行去空格啊等预处理.
2.进入manage.importTable(context);这一方法的目的是生成hdfs文件!实际调用的是org.apache.sqoop.manager.SqlManager.importTable();
详细的进入方法如下:
/**
* Default implementation of importTable() is to launch a MapReduce job
* via DataDrivenImportJob to read the table with DataDrivenDBInputFormat.
*/
public void importTable(com.cloudera.sqoop.manager.ImportJobContext context)
throws IOException, ImportException {
String tableName = context.getTableName();
String jarFile = context.getJarFile();
SqoopOptions opts = context.getOptions();
context.setConnManager(this);
ImportJobBase importer;
if (opts.getHBaseTable() !=