- 解压并安装
tar xvfz sqoop-1.99.6-bin-hadoop200.tar.gz
mv sqoop-1.99.6-bin-hadoop200 sqoop/ chmod -R 775 sqoop/ chown -R zhu sqoop/
- 修改环境变量
zhu@Master:~$ sudo vim /etc/profile
#sqoop
export SQOOP_HOME=$HOME/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
export CATALINA_BASE=$SQOOP_HOME/server
export LOGDIR=$SQOOP_HOME/logs
hadoop@Master:~$ source /etc/profile
- 修改sqoop的环境变量
zhu@Master:/$ sudo vim /home/zhu/sqoop/server/conf/sqoop.properties
#修改指向我的hadoop安装目录
org.apache.sqoop.submission.engine.mapreduce.configuration.directory= /home/zhu/hadoop/etc/hadoop
#catalina.properties
zhu@Master:/$ sudo vim /home/zhu/sqoop/server/catalina.properties
common.loader=/home/zhu/hadoop/share/hadoop/common/*.jar,/home/zhu/hadoop/share/hadoop/common/lib/*.jar,/home/zhu/hadoop/share/hadoop/hdfs/*.jar,/home/zhu/hadoop/share/hadoop/hdfs/lib/*.jar,/home/zhu/hadoop/share/hadoop/mapreduce/*.jar,/home/zhu/hadoop/share/hadoop/mapreduce/lib/*.jar,/home/zhu/hadoop/share/hadoop/tools/*.jar,/home/zhu/hadoop/share/hadoop/tools/lib/*.jar,/home/zhu/hadoop/share/hadoop/yarn/*.jar,/home/zhu/hadoop/share/hadoop/yarn/lib/*.jar,/home/zhu/hadoop/share/hadoop/httpfs/tomcat/lib/*.jar,
下载mysql驱动包,mysql-connector-java-5.1.27.jar
把jar包丢到到$SQOOP_HOME/server/lib下面
sudo cp mysql-connector-java-5.1.27.jar $SQOOP_HOME/server/lib
有时,启动sqoop时可能会遇到找不到JAVA_HOME的情况,为了保险起见我们直接在配置文件中写入JAVA_HOME
在/home/zhonglan/sqoop/bin/sqoop.sh中,添加
export JAVA_HOME=/usr/local/java
HADOOP_COMMON_HOME=/home/zhu/hadoop/share/hadoop/common
HADOOP_HDFS_HOME=/home/zhu/hadoop/share/hadoop/hdfs
HADOOP_MAPRED_HOME=/home/zhu/hadoop/share/hadoop/mapreduce
HADOOP_YARN_HOME=/home/zhu/hadoop/share/hadoop/yarn
- 启动sqoop
sqoop.sh server start - 验证启动成功
./sqoop.sh server start 启动
./sqoop.sh server stop 停止
./sqoop.sh client 进入客户端
set server --host hadoophadoopMaster --port 12000 --webapp sqoop 设置服务器,注意hadoopMaster为sqoop2服务端所在服务器的主机名
show connector --all 查看连接类型
create link --cid 1 创建连接,cid为连接类型id
show link 查看连接
update link -l 1 修改id为1的连接
delete link -l 1 删除id为1的连接
create job -f 1 -t 2 创建从连接1到连接2的job
show job 查看job
update job -jid 1 修改job
delete job -jid 1 删除job
status job -jid 1 看看job状态
stop job -jid 1 停止job
- Sqoop2使用
sqoop.sh client
Sqoop 2 被分布为两个单独的软件包;一个客户端软件包 (sqoop2-client) 和一个服务器软件包 (sqoop2-server) 。在集群中的一个节点上安装服务器包;因为 Sqoop2 Server 充当 MapReduce 客户端,所以此节点必须已安装和配置 Hadoop 。
在将充当客户端的每个节点上安装客户端包。 Sqoop2 客户端将始终连接至 Sqoop2 Server ,以执行任何操作,因此, Hadoop 无需安装在客户端节点上。
以下是通过yum 安装,这里客户端和服务器安装在同一节点上
# yum install sqoop2-server sqoop2-client
七、sqoop2 使用
7.1 命令说明
服务端的启动
sqoop2-server start
使用以下命令,以交互式模式启动客户端:
# sqoop2
sqoop:000> help
For information about Sqoop, visit: http://sqoop.apache.org/
Available commands:
exit (\x ) Exit the shell
history (\H ) Display, manage and recall edit-line history
help (\h ) Display this help message
set (\st ) Configure various client options and settings
show (\sh ) Display various objects and configuration options
create (\cr ) Create new object in Sqoop repository
delete (\d ) Delete existing object in Sqoop repository
update (\up ) Update objects in Sqoop repository
clone (\cl ) Create new object based on existing one
start (\sta) Start job
stop (\stp) Stop job
status (\stu) Display status of a job
enable (\en ) Enable object in Sqoop repository
disable (\di ) Disable object in Sqoop repository
For help on a specific command type: help command
7.2 MySQL导入数据到Hdfs
我们在使用的过程中可能会遇到错误,使用以下命令来使错误信息显示出来
sqoop:000> set option --name verbose --value true
连接 Sqoop 服务端 :
sqoop:000> set server --host hadoopMaster
Server is set successfully
6.2.1 创建 Link 对象(连接 mysql 或 hdfs 等)
检查 Sqoop 服务 (server) 已经注册的 connectors:
sqoop:000> show connector
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
| Id | Name | Version | Class | Supported Directions |
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
| 1 | kite-connector | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.kite.KiteConnector | FROM/TO |
| 2 | kafka-connector | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.kafka.KafkaConnector | TO |
| 3 | hdfs-connector | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.hdfs.HdfsConnector | FROM/TO |
| 4 | generic-jdbc-connector | 1.99.5-cdh5.4.3 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO |
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
这里我们利用 Generic JDBC Connector 的 id 来为 connector 创建新的 link 对象,下面创建的是连接 JDBC 的 link :
sqoop:000> create link -c 4 #note :这边的 4 是 connector 的 id ,表明创建的是一个 generic jdbc connector
Creating link for connector with id 4
Please fill following values to create new link object
Name: mysql-link #note : Name 是唯一的
Link configuration
JDBC Driver Class: com.mysql.jdbc.Driver
JDBC Connection String: jdbc:mysql://20.12.20.14:3306/hive #note : jdbc:mysql:// 主机名 (ip): 端口 / 数据库名
Username: root
Password: ******
JDBC Connection Properties:
There are currently 0 values in the map:
entry# protocol=tcp
There are currently 1 values in the map:
protocol = tcp
New link was successfully created with validation status OK and persistent id 1
sqoop:000> show link
+----+------------+--------------+------------------------+---------+
| Id | Name | Connector Id | Connector Name | Enabled |
+----+------------+--------------+------------------------+---------+
| 1 | mysql-link | 4 | generic-jdbc-connector | true |
+----+------------+--------------+------------------------+---------+
上面,我们成功创建了一个 id 为 1 的 jdbc link 对象。
接下来我们为 hdfs connector 创建一个 link :
sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: hdfs-link
Link configuration
HDFS URI: hdfs://hadoopMaster:8020
New link was successfully created with validation status OK and persistent id 2
已经成功创建了 id 为 2 的 hdfs link 对象。
7.2.1创建Job 对象
Connectors 的 From 用于读取数据, To 用于写入数据。使用上面的 show connector -all 命令可以显示出 Generic JDBC Connector 对 From 和 To 都是支持的。也就是说我们既可以从数据库中读取数据,也可以往数据库中写入数据。为了创建一个 Job ,我们需要指定 Job 的 From 和 To 部分, From 和 To 部分可以使用 link Id 来表示。
最后,我们可以使用这两个 link Id 来关联 job 的 From 和 To 部分。说的通俗一点,就是我们需要从哪里 (From) 读取数据 , 把这些数据导入 (To) 到哪里。
# 从 MySQL 读数据导入到 hdfs
sqoop:000> create job -f 1 -t 2
Creating job for links with from id 1 and to id 2
Please fill following values to create new job object
Name: mysql-hdfs #Name 必须唯一
From database configuration
Schema name: hive # 必填,数据库名称
Table name: TBLS # 必填,表名
Table SQL statement: # 可选
Table column names: # 可选
Partition column name: # 可选
Null value allowed for the partition column: # 可选
Boundary query: # 可选
ToJob configuration
Override null value: # 可选
Null value:
Output format:
0 : TEXT_FILE
1 : SEQUENCE_FILE
Choose: 0 # 必选
Compression format:
0 : NONE
1 : DEFAULT
2 : DEFLATE
3 : GZIP
4 : BZIP2
5 : LZO
6 : LZ4
7 : SNAPPY
8 : CUSTOM
Choose: 0 # 必选
Custom compression format:
Output directory: /tmp/tbls # 必填
Throttling resources
Extractors: 2 # 可选,对应 mapreduce 的 job 中的 map 的数量
Loaders: 1 # 可选,对应 mapreduce 的 job 中的 reduce 的数量
New job was successfully created with validation status OK and persistent id 1
这样就建立了一个新的 job ,他的 id 是 1
7.2.2启动job
可以使用以下命令来执行 job:
# 这边 -j 后面的 1 代表上面创建的 job 的 id ,可以使用 show job 来查看已经创建的 job
sqoop:000> start job -j 1 -s
7.2.3查看数据是否导入
# hadoop fs -cat /tmp/tbls/33ccf4ab-8531-45cf-bf45-b046afdb9ff6.txt
3,1440743238,1,0,'hue',0,3,'test','MANAGED_TABLE',NULL,NULL,NULL
4,1440743248,1,0,'hue',0,4,'test01','MANAGED_TABLE',NULL,NULL,NULL
5,1440743252,1,0,'hue',0,5,'test02','MANAGED_TABLE',NULL,NULL,NULL
7.3 Hdfs导出数据到MySQL
H dfs 数据
1,aaaaaaa,bbbbbb
2,ccccccc,dddddd
3,eeeeeee,ffffff
4,df,df
同样我们可以根据上面已创建的两个 Link 对象将 hdfs 上数据导出到 MySQL 的表中,只需重新创建一个 job 即可
sqoop:000> create job -f 2 -t 1
Creating job for links with from id 2 and to id 1
Please fill following values to create new job object
Name: hdfs-mysql
From Job configuration
Input directory: /tmp/test.txt
Override null value:
Null value:
To database configuration
Schema name: hive
Table name: test1
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:
Throttling resources
Extractors: 2
Loaders: 1
New job was successfully created with validation status OK and persistent id 2
这样就建立了一个新的 job ,他的 id 是 2
启动该 Job
sqoop:000> start job -j 2 -s
查看结果
mysql> select * from test1;
+------+---------+--------+
| id | va01 | va02 |
+------+---------+--------+
| 1 | aaaaaaa | bbbbbb |
| 2 | ccccccc | dddddd |
| 3 | eeeeeee | ffffff |
| 4 | df | df |
+------+---------+--------+
4 rows in set (0.00 sec)
cdh安装sqoop2后,执行start job常见错误
执行start job时出现
hadoop出现java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException
Exception in thread "main" java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException
at org.apache.hadoop.mapreduce.Job$1.run(Job.java:563)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
at org.apache.hadoop.mapreduce.Job.connect(Job.java:561)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:549)
at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:580)
at com.pzoom.mapred.WordCount.main(WordCount.java:40)
Caused by: java.lang.ClassNotFoundException: org.codehaus.jackson.map.JsonMappingException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
把jackson-core-asl-1.8.8.jar 和 jackson-mapper-asl-1.8.8.jar 复制到安装sqoop2服务器下
/var/lib/sqoop2
文件夹下,然后重启sqoop2
hdfs 修改权限 本人使用cdh安装hdfs,在任意目录下执行
hdfs dfs -ls / 查看权限
drwxrwxrwt - hdfs supergroup 0 2018-04-09 21:38 /tmp
drwxr-xr-x - hdfs supergroup 0 2018-05-29 20:47 /user
su hdfs
hdfs dfs -chmod -R 777 /
hdfs改变所有者
hdfs dfs -chown -R impala /user/impala