sqoop
sqoop是一个hadoop和关系型数据库之间的传送数据的工具
安装
tar -zxvf sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz -C /apps/
mv sqoop-1.4.6.bin__hadoop-2.0.4-alpha/ sqoop
cd /apps/sqoop/conf
mv sqoop-env-template.sh sqoop-env.sh
which hadoop
/apps/hadoop-2.6.4/bin/hadoop
vi sqoop-env.sh
#Set path to where bin/hadoop is available
export HADOOP_COMMON_HOME=/apps/hadoop-2.6.4/
#Set path to where hadoop-*-core.jar is available
export HADOOP_MAPRED_HOME=/apps/hadoop-2.6.4/
#Set the path to where bin/hive is available
export HIVE_HOME=/apps/hive/
加入mysql驱动包
cd /apps/hive/lib
cp mysql-connector-java-5.1.28-bin.jar /apps/sqoop/lib/
使用:
从mysql数据库导入数据到HDFS
$bin/sqoop import
–connect jdbc:mysql://shizhan:3306/test
–username root
–password 123456
–table test
–m 1
默认导入到hdsf的/user/hadoop/test
使用–target-dir /querydir \ 指定目录
mysql数据库导入数据到HIVE
$bin/sqoop import
–connect jdbc:mysql://shizhan:3306/test
–username root
–password 123456
–table test
–hive-import
–m 1
增加where条件并指定hdfs目录
bin/sqoop import
–connect jdbc:mysql://shizhan:3306/test
–username root
–password 123456
–table test
–where “runoob_id = 2”
–target-dir /wherequery
–m 1
指定mysql查询语句 到hdfs目录
bin/sqoop import
–connect jdbc:mysql://shizhan:3306/test
–username root
–password 123456
–target-dir /wherequery2
–query ‘select runoob_id,runoob_title,runoob_author,submission_date from test where runoob_id = 1 and $CONDITIONS’
–split-by runoob_id
–fields-terminated-by ‘\t’
–m 1
where 必须加and $CONDITIONS 固定写法
增量导入
bin/sqoop import
–connect jdbc:mysql://shizhan:3306/test
–username root
–password 123456
–table test
–incremental append
–check-column runoob_id
–last-value 1
–m 1
导出
bin/sqoop export
–connect jdbc:mysql://shizhan:3306/db
–username root
–password 123456
–table test
–export-dir /wherequery/
sqoop的job
bin/sqoop create job myimportjob
–import
–connect jdbc:mysql://shizhan:3306/test
–username root
–password 123456
–table test
–m 1 \
sqoop job --list
sqoop job --exec myjob