1.下载
wget http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.7.0.tar.gz
2.解压
3.配置环境变量
在当前用户生效一下
4.配置conf
5.加一个MySQL的驱动包到lib下面
MySQL的驱动包:mysql-connector-java-5.1.27-bin.jar
6.导数据
(数据库)
sqoop list-databases
–connect jdbc:mysql://localhost:3306
–username root
–password 123456
(表)
sqoop list-tables
–connect jdbc:mysql://localhost:3306/ruoze_d5
–username root
–password 123456
7.架包:java-jason.jar
放到Lib下
MySQL==>HDFS (要跑mr)
1.
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–table emp
2.若名字相同则覆盖原有的表
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–table emp
–delete-target-dir
3.若名字相同则覆盖原有的表,修改名字(在网页刷新中执行任务的名字),指定一个task作输出。
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–table emp
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
4.若名字相同则覆盖原有的表,修改名字(在网页刷新中执行任务的名字),指定一个task作输出,指定EMPNO,ENAME,JOB,SAL,COMM这几列,输出的名字重新写入(EMP_COLUMN_WHERE)
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–table emp
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–columns ‘EMPNO,ENAME,JOB,SAL,COMM’
–target-dir EMP_COLUMN_WHERE
5.若名字相同则覆盖原有的表,修改名字(在网页刷新中执行任务的名字),指定一个task作输出,指定EMPNO,ENAME,JOB,SAL,COMM这几列,输出的名字重新写入( EMP_COLUMN_QUERY),字段之间用空格分开,有null值且不是string类型的用0表示,如果是string,就放空
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–target-dir EMP_COLUMN_QUERY
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
6.若名字相同则覆盖原有的表,修改名字(在网页刷新中执行任务的名字),指定一个task作输出,指定EMPNO,ENAME,JOB,SAL,COMM这几列,输出的名字重新写入( EMP_COLUMN_WHERE),字段之间用空格分开,有null值且不是string类型的用0表示,如果是string,就放空,后面跟个where条件。
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–table emp
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–columns ‘EMPNO,ENAME,JOB,SAL,COMM’
–target-dir EMP_COLUMN_WHERE
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
–where ‘SAL>2000’
7.选择用sql语句来指定(-e参数)
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–target-dir EMP_COLUMN_QUERY
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
-e ‘select * from emp where empno>7900 and $CONDITIONS’
8.选择用sql语句来指定(-e参数),如果加双引号,则加\,上面是单引号。
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–target-dir EMP_COLUMN_QUERY
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
-e “select * from emp where empno>7900 and $CONDITIONS”
*没有主键的表可用split by来指定几个maptask,或者-m参数指定
sqoop import
–connect jdbc:mysql://localhost:3306/sqoop
–username root
–password 123456
–table salgrage
–split-by GRADE
-m 2
9.写个脚本封装到一个文件里去执行
vi emp.opt 创建一个文件
import
–connect
jdbc:mysql://localhost:3306/sqoop
–username
root
–password
123456
–table
emp
–delete-target-dir
执行:sqoop --options-file emp.opt
HDFS => MySQL