使用sqoop用来将关系型数据库里的内容与HDFS上的内容互转。
本文用sqoop测试2个内容:
(1)将pg上数据移动到hdfs上;
(2)将hdfs上数据移动到pg上;
hadoop版本:2.7.1
sqoop版本:sqoop-1.4.6.bin__hadoop-2.0.4-alpha.tar.gz
pg版本:9.4.1
安装sqoop:
(1)下载sqoop并解压
(2)配置环境变量/etc/profile
export SQOOP_HOME=/home/hadoop/sqoop-1.4.6
export PATH=$PATH:$SQOOP_HOME/bin
(3)修改配置文件
cd /home/hadoop/sqoop-1.4.6/conf
cp sqoop-env-template.sh sqoop-env.sh
vim sqoop-env.sh
export HADOOP_COMMON_HOME=/home/hadoop/hadoop-2.7.1
export HIVE_HOME=/home/hadoop/hive-1.2.1
cd /home/hadoop/sqoop-1.4.6/bin
vim configure-sqoop
将HBASE,ZOOKEEPER,ACCUMULO等组件的警告部分:## Moved to be a runtime check in sqoop. 和依赖部分:## dependency list 全部注释掉
(4)将pg-jdbc的JAR包移动到$SQOOP_HOME/lib下面
遇到找不到Drive的问题要改JAR包的权限。(我这里改成777)
(5)测试连接:前提是pg里已经有hadoop用户,密码为hadoop,且有一个名为hadoop数据库。
显示出所有的数据库:
sqoop list-databases --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop
结果:
[hadoop@master ~]$ sqoop list-databases --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop
15/10/19 19:07:27 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/10/19 19:07:27 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/19 19:07:27 INFO manager.SqlManager: Using default fetchSize of 1000
template1
template0
postgres
test
testdb
hivemeta
metastore
hadoop
[hadoop@master ~]$
OK,测试成功。
具体命令书写规则可以参考:
sqoop help
一、将HDFS内容导入到pg中去:
(1)、先从本地往hdfs上传一些测试数据:
hdfs dfs -put /home/hadoop/data/test_tb_content /data/
测试数据为:
[hadoop@master ~]$ cat ~/data/test_tb_content
2015 30
2016 312
1991 124
结果:
[hadoop@master sqoop-1.4.6]$ hdfs dfs -ls /data/
15/10/19 19:23:28 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
-rw-r--r-- 2 hadoop supergroup 23 2015-10-14 19:44 /data/article1.txt
-rw-r--r-- 2 hadoop supergroup 23 2015-10-14 19:44 /data/article2.txt
drwxr-xr-x - hadoop supergroup 0 2015-10-14 19:47 /data/output1
-rw-r--r-- 2 hadoop supergroup 26 2015-10-19 19:23 /data/test_tb_content
[hadoop@master sqoop-1.4.6]$
(2)、在pg中建表,一会会将hdfs上刚上传的数据移动到该表中。
[hadoop@master sqoop-1.4.6]$ psql hadoop
psql (9.4.1)
Type "help" for help.
hadoop=# create table testtb(time int, amount int);
CREATE TABLE
hadoop=# select * from testtb;
time | amount
------+--------
(0 rows)
hadoop=#
(3)hdfs –> pg
命令如下:
sqoop export --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop --table testtb --fields-terminated-by '\t' --export-dir /data/test_tb_content
实际上,sqoop会生成一个mr任务执行,运行过程以及结果如下:
[hadoop@master ~]$ sqoop export --connect jdbc:postgresql://master:5432/ --username hadoop --password hadoop --table testtb --fields-terminated-by '\t' --export-dir /data/test_tb_content
15/10/19 19:33:42 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/10/19 19:33:42 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/19 19:33:42 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/19 19:33:42 INFO tool.CodeGenTool: Beginning code generation
15/10/19 19:33:42 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "testtb" AS t LIMIT 1
15/10/19 19:33:42 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.7.1
Note: /tmp/sqoop-hadoop/compile/a6ec425dd92830872c2181f78e216356/testtb.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/19 19:33:45 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a6ec425dd92830872c2181f78e216356/testtb.jar
15/10/19 19:33:45 INFO mapreduce.ExportJobBase: Beginning export of testtb
15/10/19 19:33:45 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/19 19:33:45 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/19 19:33:46 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative
15/10/19 19:33:46 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/10/19 19:33:46 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/19 19:33:46 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.15.140:8032
15/10/19 19:33:49 INFO input.FileInputFormat: Total input paths to process : 1
15/10/19 19:33:49 INFO input.FileInputFormat: Total input paths to process : 1
15/10/19 19:33:49 INFO mapreduce.JobSubmitter: number of splits:4
15/10/19 19:33:49 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative
15/10/19 19:33:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445305382399_0001
15/10/19 19:33:50 INFO impl.YarnClientImpl: Submitted application application_1445305382399_0001
15/10/19 19:33:50 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1445305382399_0001/
15/10/19 19:33:50 INFO mapreduce.Job: Running job: job_1445305382399_0001
15/10/19 19:34:01 INFO mapreduce.Job: Job job_1445305382399_0001 running in uber mode : false
15/10/19 19:34:01 INFO mapreduce.Job: map 0% reduce 0%
15/10/19 19:34:22 INFO mapreduce.Job: map 100% reduce 0%
15/10/19 19:34:23 INFO mapreduce.Job: Job job_1445305382399_0001 completed successfully
15/10/19 19:34:23 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=533216
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=623
HDFS: Number of bytes written=0
HDFS: Number of read operations=19
HDFS: Number of large read operations=0
HDFS: Number of write operations=0
Job Counters
Launched map tasks=4
Data-local map tasks=4
Total time spent by all maps in occupied slots (ms)=76393
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=76393
Total vcore-seconds taken by all map tasks=76393
Total megabyte-seconds taken by all map tasks=78226432
Map-Reduce Framework
Map input records=3
Map output records=3
Input split bytes=536
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=237
CPU time spent (ms)=1990
Physical memory (bytes) snapshot=367874048
Virtual memory (bytes) snapshot=3355787264
Total committed heap usage (bytes)=62390272
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=0
15/10/19 19:34:23 INFO mapreduce.ExportJobBase: Transferred 623 bytes in 36.6731 seconds (16.9879 bytes/sec)
15/10/19 19:34:23 INFO mapreduce.ExportJobBase: Exported 3 records.
pg变化情况如下,移动前,testtb中无数据:
hadoop=# select * from testtb;
time | amount
------+--------
(0 rows)
移动后,testtb中有了从hdfs上移动过来的数据:
hadoop=# select * from testtb;
time | amount
------+--------
2015 | 30
2016 | 312
1991 | 124
(3 rows)
以此类推,也可以将hive、hbase中的数据移动到pg中。
二、将pg上数据移动到hdfs中去
向pg的testtb表中插入一条数据:
hadoop=# insert into testtb values(1993, 226);
INSERT 0 1
hadoop=# select * from testtb;
time | amount
------+--------
2015 | 30
2016 | 312
1991 | 124
1993 | 226
(4 rows)
hadoop=#
将上表数据导入到hdfs中:
sqoop import --connect jdbc:postgresql://master:5432/hadoop --username hadoop --password hadoop --table testtb --target-dir /testsqoopimport -m 1
同样,是将sqoop命令转化为一个mr任务执行,执行过程以及结果如下:
[hadoop@master ~]$ sqoop import --connect jdbc:postgresql://master:5432/hadoop --username hadoop --password hadoop --table testtb --target-dir /testsqoopimport -m 1
15/10/19 19:46:36 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6
15/10/19 19:46:36 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
15/10/19 19:46:36 INFO manager.SqlManager: Using default fetchSize of 1000
15/10/19 19:46:36 INFO tool.CodeGenTool: Beginning code generation
15/10/19 19:46:36 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM "testtb" AS t LIMIT 1
15/10/19 19:46:36 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/hadoop-2.7.1
Note: /tmp/sqoop-hadoop/compile/a355cffa79b861bd5a3de566d7f0a84f/testtb.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
15/10/19 19:46:38 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/a355cffa79b861bd5a3de566d7f0a84f/testtb.jar
15/10/19 19:46:38 WARN manager.PostgresqlManager: It looks like you are importing from postgresql.
15/10/19 19:46:38 WARN manager.PostgresqlManager: This transfer can be faster! Use the --direct
15/10/19 19:46:38 WARN manager.PostgresqlManager: option to exercise a postgresql-specific fast path.
15/10/19 19:46:38 INFO mapreduce.ImportJobBase: Beginning import of testtb
15/10/19 19:46:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/10/19 19:46:39 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
15/10/19 19:46:40 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/10/19 19:46:40 INFO client.RMProxy: Connecting to ResourceManager at master/172.16.15.140:8032
15/10/19 19:46:42 INFO db.DBInputFormat: Using read commited transaction isolation
15/10/19 19:46:42 INFO mapreduce.JobSubmitter: number of splits:1
15/10/19 19:46:42 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1445305382399_0002
15/10/19 19:46:43 INFO impl.YarnClientImpl: Submitted application application_1445305382399_0002
15/10/19 19:46:43 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1445305382399_0002/
15/10/19 19:46:43 INFO mapreduce.Job: Running job: job_1445305382399_0002
15/10/19 19:46:51 INFO mapreduce.Job: Job job_1445305382399_0002 running in uber mode : false
15/10/19 19:46:51 INFO mapreduce.Job: map 0% reduce 0%
15/10/19 19:46:58 INFO mapreduce.Job: map 100% reduce 0%
15/10/19 19:46:58 INFO mapreduce.Job: Job job_1445305382399_0002 completed successfully
15/10/19 19:46:58 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=133432
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=87
HDFS: Number of bytes written=35
HDFS: Number of read operations=4
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Other local map tasks=1
Total time spent by all maps in occupied slots (ms)=4293
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=4293
Total vcore-seconds taken by all map tasks=4293
Total megabyte-seconds taken by all map tasks=4396032
Map-Reduce Framework
Map input records=4
Map output records=4
Input split bytes=87
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=83
CPU time spent (ms)=590
Physical memory (bytes) snapshot=98398208
Virtual memory (bytes) snapshot=841052160
Total committed heap usage (bytes)=15663104
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=35
15/10/19 19:46:58 INFO mapreduce.ImportJobBase: Transferred 35 bytes in 18.583 seconds (1.8834 bytes/sec)
15/10/19 19:46:58 INFO mapreduce.ImportJobBase: Retrieved 4 records.
hdfs上结果:
[hadoop@master ~]$ hdfs dfs -ls /
15/10/19 19:47:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 4 items
drwxr-xr-x - hadoop supergroup 0 2015-10-19 19:23 /data
drwxr-xr-x - hadoop supergroup 0 2015-10-19 01:53 /hive
drwxr-xr-x - hadoop supergroup 0 2015-10-19 19:46 /testsqoopimport
drwx------ - hadoop supergroup 0 2015-10-19 05:04 /tmp
[hadoop@master ~]$ hdfs dfs -ls /testsqoopimport
15/10/19 19:47:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r-- 2 hadoop supergroup 0 2015-10-19 19:46 /testsqoopimport/_SUCCESS
-rw-r--r-- 2 hadoop supergroup 35 2015-10-19 19:46 /testsqoopimport/part-m-00000
[hadoop@master ~]$ hdfs dfs -cat /testsqoopimport/part-m-00000
15/10/19 19:49:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2015,30
2016,312
1991,124
1993,226
[hadoop@master ~]$
至此,sqoop基本操作完成。