sqoop配置以及使用过程小结

使用sqoop用来将关系型数据库里的内容与HDFS上的内容互转。

本文用sqoop测试2个内容:
(1)将pg上数据移动到hdfs上;
(2)将hdfs上数据移动到pg上;

hadoop版本:2.7.1
sqoop版本:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
pg版本:9.4.1

安装sqoop:

(1)下载sqoop并解压
(2)配置环境变量/etc/profile

export SQOOP_HOME=/home/hadoop/sqoop-1.4.6
export PATH=$PATH:$SQOOP_HOME/bin

(3)修改配置文件

cd /home/hadoop/sqoop-1.4.6/conf
cp sqoop-env-template.sh  sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.1
export HIVE_HOME=/home/hadoop/hive-1.2.1
cd /home/hadoop/sqoop-1.4.6/bin
vim configure-sqoop
将HBASE,ZOOKEEPER,ACCUMULO等组件的警告部分:## Moved to be a runtime check in sqoop. 和依赖部分:## dependency list 全部注释掉

(4)将pg-jdbc的JAR包移动到$SQOOP_HOME/lib下面

遇到找不到Drive的问题要改JAR包的权限。(我这里改成777)

(5)测试连接:前提是pg里已经有hadoop用户,密码为hadoop,且有一个名为hadoop数据库。

显示出所有的数据库:

sqoop list-databases --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop

结果:

[hadoop@master ~]$ sqoop list-databases --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop
15/10/19 19:07:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/10/19 19:07:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/19 19:07:27 INFO manager.SqlManager: Using default fetchSize of 1000
template1
template0
postgres
test
testdb
hivemeta
metastore
hadoop
[hadoop@master ~]$ 

OK,测试成功。

具体命令书写规则可以参考:

sqoop help

一、将HDFS内容导入到pg中去:

(1)、先从本地往hdfs上传一些测试数据:

hdfs dfs -put /home/hadoop/data/test_tb_content /data/

测试数据为:

[hadoop@master ~]$ cat ~/data/test_tb_content 
2015    30
2016    312
1991    124

结果:

[hadoop@master sqoop-1.4.6]$ hdfs dfs -ls /data/
15/10/19 19:23:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
-rw-r--r--   2 hadoop supergroup         23 2015-10-14 19:44 /data/article1.txt
-rw-r--r--   2 hadoop supergroup         23 2015-10-14 19:44 /data/article2.txt
drwxr-xr-x   - hadoop supergroup          0 2015-10-14 19:47 /data/output1
-rw-r--r--   2 hadoop supergroup         26 2015-10-19 19:23 /data/test_tb_content
[hadoop@master sqoop-1.4.6]$ 

(2)、在pg中建表,一会会将hdfs上刚上传的数据移动到该表中。

[hadoop@master sqoop-1.4.6]$ psql hadoop
psql (9.4.1)
Type "help" for help.

hadoop=# create table testtb(time int, amount int);
CREATE TABLE
hadoop=# select * from testtb;
 time | amount 
------+--------
(0 rows)

hadoop=# 

(3)hdfs –> pg

命令如下:

sqoop export --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop --table testtb --fields-terminated-by '\t' --export-dir /data/test_tb_content

实际上,sqoop会生成一个mr任务执行,运行过程以及结果如下:

[hadoop@master ~]$ sqoop export --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop --table testtb --fields-terminated-by '\t' --export-dir /data/test_tb_content
15/10/19 19:33:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/10/19 19:33:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/19 19:33:42 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/19 19:33:42 INFO tool.CodeGenTool: Beginning code generation
15/10/19 19:33:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "testtb" AS t LIMIT 1
15/10/19 19:33:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.7.1
Note: /tmp/sqoop-hadoop/compile/a6ec425dd92830872c2181f78e216356/testtb.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/19 19:33:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a6ec425dd92830872c2181f78e216356/testtb.jar
15/10/19 19:33:45 INFO mapreduce.ExportJobBase: Beginning export of testtb
15/10/19 19:33:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/19 19:33:45 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/19 19:33:46 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/10/19 19:33:46 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/10/19 19:33:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/19 19:33:46 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.15.140:8032
15/10/19 19:33:49 INFO input.FileInputFormat: Total input paths to process : 1
15/10/19 19:33:49 INFO input.FileInputFormat: Total input paths to process : 1
15/10/19 19:33:49 INFO mapreduce.JobSubmitter: number of splits:4
15/10/19 19:33:49 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/10/19 19:33:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445305382399_0001
15/10/19 19:33:50 INFO impl.YarnClientImpl: Submitted application application_1445305382399_0001
15/10/19 19:33:50 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1445305382399_0001/
15/10/19 19:33:50 INFO mapreduce.Job: Running job: job_1445305382399_0001
15/10/19 19:34:01 INFO mapreduce.Job: Job job_1445305382399_0001 running in uber mode : false
15/10/19 19:34:01 INFO mapreduce.Job:  map 0% reduce 0%
15/10/19 19:34:22 INFO mapreduce.Job:  map 100% reduce 0%
15/10/19 19:34:23 INFO mapreduce.Job: Job job_1445305382399_0001 completed successfully
15/10/19 19:34:23 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=533216
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=623
        HDFS: Number of bytes written=0
        HDFS: Number of read operations=19
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=0
    Job Counters 
        Launched map tasks=4
        Data-local map tasks=4
        Total time spent by all maps in occupied slots (ms)=76393
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=76393
        Total vcore-seconds taken by all map tasks=76393
        Total megabyte-seconds taken by all map tasks=78226432
    Map-Reduce Framework
        Map input records=3
        Map output records=3
        Input split bytes=536
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=237
        CPU time spent (ms)=1990
        Physical memory (bytes) snapshot=367874048
        Virtual memory (bytes) snapshot=3355787264
        Total committed heap usage (bytes)=62390272
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=0
15/10/19 19:34:23 INFO mapreduce.ExportJobBase: Transferred 623 bytes in 36.6731 seconds (16.9879 bytes/sec)
15/10/19 19:34:23 INFO mapreduce.ExportJobBase: Exported 3 records.

pg变化情况如下,移动前,testtb中无数据:

hadoop=# select * from testtb;
 time | amount 
------+--------
(0 rows)

移动后,testtb中有了从hdfs上移动过来的数据:

hadoop=# select * from testtb;
 time | amount 
------+--------
 2015 |     30
 2016 |    312
 1991 |    124
(3 rows)

以此类推,也可以将hive、hbase中的数据移动到pg中。

二、将pg上数据移动到hdfs中去

向pg的testtb表中插入一条数据:

hadoop=# insert into testtb values(1993, 226);
INSERT 0 1
hadoop=# select * from testtb;
 time | amount 
------+--------
 2015 |     30
 2016 |    312
 1991 |    124
 1993 |    226
(4 rows)

hadoop=# 

将上表数据导入到hdfs中:

sqoop import --connect jdbc:postgresql://master:5432/hadoop --username hadoop --password hadoop --table testtb --target-dir /testsqoopimport -m 1

同样,是将sqoop命令转化为一个mr任务执行,执行过程以及结果如下:

[hadoop@master ~]$ sqoop import --connect jdbc:postgresql://master:5432/hadoop --username hadoop --password hadoop --table testtb --target-dir /testsqoopimport -m 1
15/10/19 19:46:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/10/19 19:46:36 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/19 19:46:36 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/19 19:46:36 INFO tool.CodeGenTool: Beginning code generation
15/10/19 19:46:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "testtb" AS t LIMIT 1
15/10/19 19:46:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.7.1
Note: /tmp/sqoop-hadoop/compile/a355cffa79b861bd5a3de566d7f0a84f/testtb.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/19 19:46:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a355cffa79b861bd5a3de566d7f0a84f/testtb.jar
15/10/19 19:46:38 WARN manager.PostgresqlManager: It looks like you are importing from postgresql.
15/10/19 19:46:38 WARN manager.PostgresqlManager: This transfer can be faster! Use the --direct
15/10/19 19:46:38 WARN manager.PostgresqlManager: option to exercise a postgresql-specific fast path.
15/10/19 19:46:38 INFO mapreduce.ImportJobBase: Beginning import of testtb
15/10/19 19:46:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/19 19:46:39 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/19 19:46:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/19 19:46:40 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.15.140:8032
15/10/19 19:46:42 INFO db.DBInputFormat: Using read commited transaction isolation
15/10/19 19:46:42 INFO mapreduce.JobSubmitter: number of splits:1
15/10/19 19:46:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445305382399_0002
15/10/19 19:46:43 INFO impl.YarnClientImpl: Submitted application application_1445305382399_0002
15/10/19 19:46:43 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1445305382399_0002/
15/10/19 19:46:43 INFO mapreduce.Job: Running job: job_1445305382399_0002
15/10/19 19:46:51 INFO mapreduce.Job: Job job_1445305382399_0002 running in uber mode : false
15/10/19 19:46:51 INFO mapreduce.Job:  map 0% reduce 0%
15/10/19 19:46:58 INFO mapreduce.Job:  map 100% reduce 0%
15/10/19 19:46:58 INFO mapreduce.Job: Job job_1445305382399_0002 completed successfully
15/10/19 19:46:58 INFO mapreduce.Job: Counters: 30
    File System Counters
        FILE: Number of bytes read=0
        FILE: Number of bytes written=133432
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=87
        HDFS: Number of bytes written=35
        HDFS: Number of read operations=4
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters 
        Launched map tasks=1
        Other local map tasks=1
        Total time spent by all maps in occupied slots (ms)=4293
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=4293
        Total vcore-seconds taken by all map tasks=4293
        Total megabyte-seconds taken by all map tasks=4396032
    Map-Reduce Framework
        Map input records=4
        Map output records=4
        Input split bytes=87
        Spilled Records=0
        Failed Shuffles=0
        Merged Map outputs=0
        GC time elapsed (ms)=83
        CPU time spent (ms)=590
        Physical memory (bytes) snapshot=98398208
        Virtual memory (bytes) snapshot=841052160
        Total committed heap usage (bytes)=15663104
    File Input Format Counters 
        Bytes Read=0
    File Output Format Counters 
        Bytes Written=35
15/10/19 19:46:58 INFO mapreduce.ImportJobBase: Transferred 35 bytes in 18.583 seconds (1.8834 bytes/sec)
15/10/19 19:46:58 INFO mapreduce.ImportJobBase: Retrieved 4 records.

hdfs上结果:

[hadoop@master ~]$ hdfs dfs -ls /
15/10/19 19:47:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxr-xr-x   - hadoop supergroup          0 2015-10-19 19:23 /data
drwxr-xr-x   - hadoop supergroup          0 2015-10-19 01:53 /hive
drwxr-xr-x   - hadoop supergroup          0 2015-10-19 19:46 /testsqoopimport
drwx------   - hadoop supergroup          0 2015-10-19 05:04 /tmp
[hadoop@master ~]$ hdfs dfs -ls /testsqoopimport
15/10/19 19:47:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   2 hadoop supergroup          0 2015-10-19 19:46 /testsqoopimport/_SUCCESS
-rw-r--r--   2 hadoop supergroup         35 2015-10-19 19:46 /testsqoopimport/part-m-00000
[hadoop@master ~]$ hdfs dfs -cat /testsqoopimport/part-m-00000
15/10/19 19:49:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015,30
2016,312
1991,124
1993,226
[hadoop@master ~]$ 

至此,sqoop基本操作完成。

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值