问题导读:
1、Sqoop使用SQL语句实现数据导入使用哪个参数?
2、使用--query参数执行数据导入,三个必须加上的参数是?
3、--split-by参数的作用?
4、Sqoop执行数据导入时,Map tasks的默认个数是?
5、--query后SQL语句双引号和单引号的区别?该怎么解决?
6、Sqoop执行数据导入有哪两种数据文件格式?默认的是哪个文件格式?
一、自由查询形式导入
Sqoop还支持将任意的查询结果集导入,不使用--table、--columns和--where,使用SQL语句--query参数执行自由查询导入,但是必须指定--target-dir目录,必须指定--split-by 分隔列,同时必须使用where且在其后加个$CONDITIONS,使Sqoop进程替代为一个唯一的条件表达式达到条件查询效果。如下:
[hadoopUser@secondmgt conf]$ sqoop import --connect jdbc:mysql://secondmgt:3306/spice --username hive --password hive --query 'select * from users where id<60 and $CONDITIONS' --split-by id -m 1 --target-dir /output/query/
Warning: /usr/lib/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
15/01/18 14:30:10 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/01/18 14:30:10 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
15/01/18 14:30:10 INFO tool.CodeGenTool: Beginning code generation
15/01/18 14:30:11 INFO manager.SqlManager: Executing SQL statement: select * from users where id<60 and (1 = 0)
15/01/18 14:30:11 INFO manager.SqlManager: Executing SQL statement: select * from users where id<60 and (1 = 0)
15/01/18 14:30:11 INFO manager.SqlManager: Executing SQL statement: select * from users where id<60 and (1 = 0)
15/01/18 14:30:11 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoopUser/cloud/hadoop/programs/hadoop-2.2.0
Note: /tmp/sqoop-hadoopUser/compile/3488270c7f7b23dd3b556d8d185f6a82/QueryResult.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/01/18 14:30:12 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoopUser/compile/3488270c7f7b23dd3b556d8d185f6a82/QueryResult.jar
15/01/18 14:30:12 INFO mapreduce.ImportJobBase: Beginning query import.
15/01/18 14:30