Sqoop 工具是hadoop环境下连接关系数据库,和hadoop存储系统的桥梁,支持多种关系数据源和hive,hdfs,hbase的相互导入。一般情况下,关系数据表存在于线上环境的备份环境,需要每天进行数据导入,根据每天的数据量而言,sqoop可以全表导入,对于每天产生的数据量不是很大的情形可以全表导入,但是sqoop也提供了增量数据导入的机制。
下面介绍几个常用的sqoop的命令,以及一些参数:
|
序号 |
命令/command |
类 |
说明 |
|
1 |
impor |
ImportTool |
从关系型数据库中导入数据(来自表或者查询语句)到HDFS中 |
|
2 |
export |
ExportTool |
将HDFS中的数据导入到关系型数据库中 |
|
3 |
codegen |
CodeGenTool |
获取数据库中某张表数据生成Java并打成jar包 |
|
4 |
create-hive-table |
CreateHiveTableTool |
创建Hive表 |
|
5 |
eval |
EvalSqlTool |
查看SQL执行结果 |
|
6 |
import-all-tables |
ImportAllTablesTool |
导入某个数据库下所有表到HDFS中 |
|
7 |
job |
JobTool |
|
|
8 |
list-databases |
ListDatabasesTool |
列出所有数据库名 |
|
9 |
list-tables |
ListTablesTool |
列出某个数据库下所有表 |
|
10 |
merge |
MergeTool |
|
|
11 |
metastore |
MetastoreTool |
|
|
12 |
help |
HelpTool |
查看帮助 |
|
13 |
version |
VersionTool |
查看版本 |
接着列出Sqoop的各种通用参数,然后针对以上13个命令列出他们自己的参数.Sqoop通用参数又分
Common arguments
Incrementalimport arguments
Outputline formatting arguments
Inputparsing arguments,Hive arguments
HBasearguments
GenericHadoop command-line arguments
1.Common arguments通用参数,主要是针对关系型数据库链接的一些参数
|
序号 |
参数 |
说明 |
样例 |
|
1 |
connect |
连接关系型数据库的URL |
jdbc:mysql://localhost/sqoop_datas |
|
2 |
connection-manager |
连接管理类,一般不用 |
|
|
3 |
driver |
连接驱动 |
|
|
4 |
hadoop-home |
hadoop目录 |
/home/hadoop |
|
5 |
help |
查看帮助信息 |
|
|
6 |
password |
连接关系型数据库的密码 |
|
|
7 |
username |
链接关系型数据库的用户名 |
|
|
8 |
verbose |
查看更多的信息,其实是将日志级别调低 |
该参数后面不接值 |
Importcontrol arguments:
|
Argument |
Description |
|
--append |
Append data to an existing dataset in HDFS |
|
--as-avrodatafile |
Imports data to Avro Data Files |
|
--as-sequencefile |
Imports data to SequenceFiles |
|
--as-textfile |
Imports data as plain text (default) |
|
--boundary-query <statement> |
Boundary query to use for creating splits |
|
--columns <col,col,col…> |
Columns to import from table |
|
--direct |
Use direct import fast path |
|
--direct-split-size <n> |
Split the input stream every n bytes when importing in direct mode |
|
--inline-lob-limit <n> |
Set the maximum size for an inline LOB |
|
-m,--num-mappers <n> |
Use n map tasks to import in parallel |
|
-e,--query <statement> |
Import the results of statement. |
|
--split-by <column-name> |
Column of the table used to split work units |
|
--table <table-name> |
Table to read |
|
--target-dir <dir> |
HDFS destination dir |
|
--warehouse-dir <dir> |
HDFS parent for table destination |
|
--where <where clause> |
WHERE clause to use during import |
|
-z,--compress |
Enable compression |
|
--compression-codec <c> |
Use Hadoop codec (default gzip) |
|
--null-string <null-string> |
The string to be written for a null value for string columns |
|
--null-non-string <null-string> |
The string to be written for a null value for non-string columns |
Incrementalimport arguments:
|
Argument |
Description |
|
--check-column (col) |
Specifies the column to be examined when determining which rows to import. |
|
--incremental (mode) |
Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified. |
|
--last-value (value) |
Specifies the maximum value of the check column from the previous import. |
Output lineformatting arguments:
|
Argument |
Description |
|
--enclosed-by <char> |
Sets a required field enclosing character |
|
--escaped-by <char> |
Sets the escape character |
|
--fields-terminated-by <char> |
Sets the field separator character |
|
--lines-terminated-by <char> |
Sets the end-of-line character |
|
--mysql-delimiters |
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' |
|
--optionally-enclosed-by <char> |
Sets a field enclosing character |
Hivearguments:
|
Argument |
Description |
|
--hive-home <dir> |
Override $HIVE_HOME |
|
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
|
--hive-overwrite |
Overwrite existing data in the Hive table. |
|
--create-hive-table |
If set, then the job will fail if the target hive |
|
|
table exits. By default this property is false. |
|
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
|
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
|
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
|
--hive-partition-key |
Name of a hive field to partition are sharded on |
|
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
|
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
HBasearguments:
|
Argument |
Description |
|
--column-family <family> |
Sets the target column family for the import |
|
--hbase-create-table |
If specified, create missing HBase tables |
|
--hbase-row-key <col> |
Specifies which input column to use as the row key |
|
--hbase-table <table-name> |
Specifies an HBase table to use as the target instead of HDFS |
Codegeneration arguments:
|
Argument |
Description |
|
--bindir <dir> |
Output directory for compiled objects |
|
--class-name <name> |
Sets the generated class name. This overrides --package-name. When combined with --jar-file, sets the input class. |
|
--jar-file <file> |
Disable code generation; use specified jar |
|
--outdir <dir> |
Output directory for generated code |
|
--package-name <name> |
Put auto-generated classes in this package |
|
--map-column-java <m> |
Override default mapping from SQL type to Java type for configured columns. |
3008

被折叠的 条评论
为什么被折叠?



