Sqoop的环境搭建
1,准备好
sqoop-1.4.6-cdh5.14.2.tar.gz
java-json.jar
mysql-connector-java-5.1.27-bin.jar
2,解压安装,配置环境变量
[root@hadoop151 opt]# vi /etc/profile
添加以下内容
export SQOOP_HOME=/opt/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
使生效
[root@hadoop151 opt]# source /etc/profile
修改conf下的配置文件
mv sqoop-env-template.sh sqoop-env.sh
[root@hadoop151 opt]# vi /opt/sqoop/conf/sqoop-env.sh
export HADOOP_COMMON_HOME=/opt/hadoop
export HADOOP_MAPRED_HOME=/opt/hadoop
export HIVE_HOME=/opt/hive
export ZOOKEEPER_HOME=/opt/zookeeper-3.4.6
export ZOOCFGDIR=/opt/zookeeper-3.4.6/conf
export HBASE_HOME=/opt/hbase
3,拷贝到sqoop的lib目录下
[root@hadoop151 opt]# cp mysql-connector-java-5.1.27-bin.jar /opt/sqoop/lib/
[root@hadoop151 opt]# cp java-json.jar /opt/sqoop/lib/
3.*
把hive lib下地hive-com*包和exec*包导入sqoop下地lib包中
把hive地hive-site文件夹放到conf目录下
4验证Sqoop
[root@hadoop151 opt]# sqoop version
出现如下内容即为正确
20/07/22 18:54:53 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
Sqoop 1.4.6-cdh5.14.2
git commit id
Compiled by jenkins on Tue Mar 27 13:19:49 PDT 2018
数据迁移
MySQL->HDFS
准备工作:mysql中建库建表
mysql> create database retail_db;
mysql> use retail_db;
mysql> source /root/data/sqoop/retail_db.sql
mysql> show tables;
+---------------------+
| Tables_in_retail_db |
+---------------------+
| categories |
| customers |
| departments |
| order_items |
| orders |
| products |
+---------------------+
6 rows in set (0.00 sec)
使用sqoop将customers表导入到hdfs上
sqoop import --connect jdbc:mysql://localhost:3306/retail_db --driver com.mysql.jdbc.Driver --table customers --username root --password root --target-dir /data/retail_db/customers --m 3
使用where过滤
sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table orders \
--where "order_id<100" \
--username root \
--password root \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--m 3
使用columns过滤
sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table customers \
--columns "customer_id,customer_fname,customer_lname" \
--username root \
--password root \
--delete-target-dir \
--target-dir /data/retail_db/customers \
--m 3
使用查询语句进行过滤(注意要在查询语句后面加and $CONDITONS 和要加split-by)
sqoop import --connect jdbc:mysql://localhost:3306/sqoop \
--driver com.mysql.jdbc.Driver --table emp \
--query "select * from emp where EMPNO>=7900 and \$CONDITIONS" --username root --split-by host \
--password ok --delete-target-dir --target-dir /data/retail_db/customers --m 3
增量导入
sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table orders \
--username root \
--password root \
--incremental append \
--check-column order_date \ //指定递增的列
--last-value '2013-07-24 00:00:00'\ //指定上一次的最大值
--target-dir /data/retail_db/orders \
--m 3
创建job
sqoop job --create mysql2hdfs \
-- import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table orders \
--username root \
--password root \
--incremental append \
--check-column order_date \
--last-value '0' \
--target-dir /data/retail_db/orders \
--m 3
查看job
sqoop job --list
执行job
sqoop job --exec mysql2hdfs
每次job执行成功之后都会修改 --last-value 值 将最后一次的最大值填充进去
这里的 ‘0’ 没有实际含义,只是为了保证第一次数据导入时值最小
每天可以定时执行
crontab -e
* 2 */1 * * sqoop job --exec mysql2hdfs
导入数据到Hive中
先在Hive中创建表
hive -e "create database if not exists retail_db;"
sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--table orders \
--username root \
--password root \
--hive-import \
--create-hive-table \
--hive-database retail_db \
--hive-table orders \
--m 3
导入数据到Hive分区中
删除Hive表
drop table if exists orders;
导入
sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--query "select order_id,order_status from orders where order_date>='2013-11-03' and order_date <'2013-11-04' and \$CONDITIONS" \
--username root \
--password root \
--delete-target-dir \
--target-dir /data/retail_db/orders \
--split-by order_id \
--hive-import \
--hive-database retail_db \
--hive-table orders \
--hive-partition-key "order_date" \
--hive-partition-value "2013-11-03" \
--m 3
注意:分区字段不能当成普通字段导入表中
导入数据到HBase中
1.在HBase中建表
create 'products','data','category'
2.sqoop导入
sqoop import \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table products \
--hbase-table products \
--column-family data \
--m 3
HDFS 向MySQL中导出数据
1.MySQL中建表
create table customers_demo as select * from customers where 1=2;
2.上传数据
hdfs dfs -mkdir /customerinput
hdfs dfs -put customers.csv /customerinput
3.导出数据
sqoop export \
--connect jdbc:mysql://localhost:3306/retail_db \
--driver com.mysql.jdbc.Driver \
--username root \
--password root \
--table customers_demo \
--export-dir /customerinput \
--m 1
sqoop 脚本
1.编写脚本 job_RDBMS2HDFS.opt
import
--connect
jdbc:mysql://localhost:3306/retail_db
--driver
com.mysql.jdbc.Driver
--table
customers
--username
root
--password
root
--target-dir
/data/retail_db/customers
--delete-target-dir
--m
3
2.执行脚本
sqoop --options-file job_RDBMS2HDFS.opt