Sqoop 数据迁移工具
转换成 MapReduce 程序 执行,只有Mapper任务
SqlToHadoop InputFormat OutputFormat
1. Sqoop 安装
- 下载并解压安装包
- 修改配置文件
sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/opt/module/hadoop-2.7.1
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/opt/module/hadoop-2.7.1
#set the path to where bin/hbase is available
export HBASE_HOME=/opt/module/hbase-1.3.1
#Set the path to where bin/hive is available
export HIVE_HOME=/opt/module/hive-3.1.2
#Set the path for where zookeper config dir is
export ZOOCFGDIR=/opt/module/zookeeper-3.5.7
- 拷贝jdbc驱动包到lib下
- 启动测试
bin/sqoop help
# connect mysql
bin/sqoop list-databases --connect jdbc:mysql://hadoop102:3306 --username root --password 000000
2. Sqoop 使用
2.1 Ipmoort
2.1.1 MySQL -> HDFS
- 全量导入
bin/sqoop import --connect jdbc:mysql://hadoop102:3306/datax --username root --password 000000 --table test1 --target-dir /sqoop/import/test1 --delete-target-dir --as-parquetfile -m 1
- 查询导入
bin/sqoop import --connect jdbc:mysql://hadoop102:3306/datax --username root --password 000000 --target-dir /sqoop/import/test2 --delete-target-dir -m 1 -e 'select id,name from test1 where id > 1001 and $CONDITIONS' --fields-terminated-by ','
- 增量导入
– where 指定为每天更新的时间字段即可
bin/sqoop import --connect jdbc:mysql://hadoop102:3306/datax --username root --password 000000 --table test1 --target-dir /sqoop/import/test3 --delete-target-dir --as-parquetfile -m 1 --where 'id=1001'
2.1.2 MySQL -> Hive
# hive target table 会自动创建
bin/sqoop import --connect jdbc:mysql://hadoop102:3306/datax --username root --password 000000 --table test1 -m 1 --hive-import --hive-overwrite --create-hive-table -hive-table test.hive_test1
2.1.3 MySQL -> Hbase
# sqoop1.4.6 与 hbase1.3.1不兼容,不能自动创建表
bin/sqoop import --connect jdbc:mysql://hadoop102:3306/datax --username root --password 000000 --table test1 -m 1 --hbase-table hbase_test1 --column-family info --hbase-create-table --hbase-row-key id
2.2 Export
2.2.1 HDFS -> MySQL
bin/sqoop export --connect jdbc:mysql://hadoop102:3306/datax --username root --password 000000 --table test1 -m 1 --export-dir /sqoop/import/test2 --input-fields-terminated-by ','
3. Sqoop Shell
mysql2hdfs.opt
import
--connect
jdbc:mysql://hadoop102:3306/datax
--username
root
--password
000000
--table
test1
--target-dir
/sqoop/import/test4
--delete-target-dir
--as-parquetfile
-m
1
启动
bin/sqoop --options-file opt/mysql2hdfs.opt