sqoop 学习笔记

最新推荐文章于 2024-05-08 02:10:38 发布

weixin_43779531

最新推荐文章于 2024-05-08 02:10:38 发布

阅读量904

点赞数

CC 4.0 BY-SA版权

本文链接：https://blog.youkuaiyun.com/weixin_43779531/article/details/84795209

这篇博客记录了使用sqoop从MySQL导入数据到HDFS以及Hive的过程，包括list-databases、list-tables命令的使用，以及遇到的如找不到主键、Hive配置问题、字段大小写问题等错误及解决方案。通过示例展示了import命令的不同参数设置，如split-by、--hive-import、--hive-table等。最后，说明了导入后需要手动执行MSCK REPAIR TABLE修复Hive分区的问题。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

sqoop list-databases
–connect jdbc:mysql://hadoop5:3306
–username root
–password root

sqoop list-tables
–connect jdbc:mysql://hadoop5:3306/ruoze_ha_d5
–username root
–password root

information_schema
hivedb
mysql
performance_schema
ruoze_ha_d5
ruozedata
ruozedb
test

sqoop import
–connect jdbc:mysql://hadoop5:3306/ruozedata
–username root
–password root
–table emp
–split-by ‘empno’
–delete-target-dir

注意读错误日志,将错误日志进行详细的分析
Error during import: No primary key could be found for table emp. Please specify one with --split-by or perform a sequential import with ‘-m 1’.

sqoop import
–connect jdbc:mysql://hadoop5:3306/ruozedata
–username root
–password root
–table emp
–mapreduce-job-name FromMySQLToHDFS
-m 1
–delete-target-dir
–columns ‘EMPNO,ENAME,JOB,SAL,COMM’
–target-dir EMP_COLUMN_WHERE
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
–where ‘SAL>2000’

sqoop import
–connect jdbc:mysql://hadoop5:3306/ruozedata
–username root
–password root
–mapreduce-job-name FromMySQLToHDFS
-m 6
–delete-target-dir
–target-dir EMP_COLUMN_QUERY
–fields-terminated-by ‘\t’
–null-non-string ‘0’
–null-string ‘’
–split-by ‘empno’
-e ‘select * from emp where empno>7900 and $CONDITIONS’

–存在字段的内容

sqoop import
–connect jdbc:mysql://hadoop5:3306/ruozedata
–username root
–password root
–table emp
–delete-target-dir
–hive-import
–hive-table d5_emp_test_p
–fields-terminated-by ‘\t’
–columns ‘empno,ename,job,sal,comm’
–split-by ‘empno’
–hive-overwrite
–hive-partition-key ‘pt’
–hive-partition-value ‘ruoze’

sqoop import
–connect jdbc:mysql://hadoop5:3306/ruozedata
–username root
–password root
–table emp
–delete-target-dir
–hive-import
–hive-table d5_emp_test_p
–fields-terminated-by ‘\t’
–columns ‘empno,ename,job,sal,comm’
–hive-overwrite
–hive-partition-key ‘pt’
–hive-partition-value ‘ruoze’

CREATE TABLE d5_emp_test_p (
empno int,
ename string,
job string,
sal double,
comm double
)partitioned by (pt string)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t’;

问题:
18/12/05 01:47:34 ERROR hive.HiveConfig: Could not load org.apache.hadoop.hive.conf.HiveConf. Make sure HIVE_CONF_DIR is set correctly.
18/12/05 01:47:34 ERROR tool.ImportTool: Encountered IOException running import job: java.io.IOException: java.lang.ClassNotFoundException: org.apache.hadoop.hive.conf.HiveConf
解决方案:
注意sqoop 添加到 vi ~/.bash_profile
export HADOOP_CLASSPATH= $HADOOP_CLASSPATH:$ HIVE_HOME/lib/*

注意字段大小写问题,字段必须统一大小写,否则会出现报错现象

18/12/05 01:38:43 ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
18/12/05 01:38:43 ERROR util.SqlTypeMap: exist in the table. Please ensure that you’ve specified
18/12/05 01:38:43 ERROR util.SqlTypeMap: correct column names in Sqoop options.
18/12/05 01:38:43 ERROR tool.ImportTool: Imported Failed: column not found: EMPNO

[hadoop@hadoop001 conf]$ sqoop import \

–connect jdbc:mysql://hadoop5:3306/ruozedata
–username root
–password root
–table emp
–delete-target-dir
–hive-import
–hive-table d5_emp_test
–fields-terminated-by ‘\t’
–columns ‘EMPNO,ENAME,JOB,SAL,COMM’
–split-by ‘empno’
–hive-overwrite
–hive-partition-key ‘pt’
–hive-partition-value ‘ruoze’
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/…/hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/…/hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/hadoop/app/sqoop-1.4.6-cdh5.7.0/…/accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
18/12/05 01:38:20 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.7.0
18/12/05 01:38:20 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
18/12/05 01:38:20 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
18/12/05 01:38:20 INFO tool.CodeGenTool: Beginning code generation
18/12/05 01:38:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
18/12/05 01:38:20 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
18/12/05 01:38:20 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/hadoop/app/hadoop-2.6.0-cdh5.7.0
Note: /tmp/sqoop-hadoop/compile/b1488c8443ac661a53a189cf56de2382/emp.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
18/12/05 01:38:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hadoop/compile/b1488c8443ac661a53a189cf56de2382/emp.jar
18/12/05 01:38:22 INFO tool.ImportTool: Destination directory emp is not present, hence not deleting.
18/12/05 01:38:22 WARN manager.MySQLManager: It looks like you are importing from mysql.
18/12/05 01:38:22 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
18/12/05 01:38:22 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
18/12/05 01:38:22 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
18/12/05 01:38:22 INFO mapreduce.ImportJobBase: Beginning import of emp
18/12/05 01:38:22 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
18/12/05 01:38:22 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
18/12/05 01:38:29 INFO db.DBInputFormat: Using read commited transaction isolation
18/12/05 01:38:29 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(empno), MAX(empno) FROM emp
18/12/05 01:38:30 INFO mapreduce.JobSubmitter: number of splits:4
18/12/05 01:38:30 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1543937863898_0012
18/12/05 01:38:30 INFO impl.YarnClientImpl: Submitted application application_1543937863898_0012
18/12/05 01:38:30 INFO mapreduce.Job: The url to track the job: http://hadoop001:8088/proxy/application_1543937863898_0012/
18/12/05 01:38:30 INFO mapreduce.Job: Running job: job_1543937863898_0012
18/12/05 01:38:36 INFO mapreduce.Job: Job job_1543937863898_0012 running in uber mode : false
18/12/05 01:38:36 INFO mapreduce.Job: map 0% reduce 0%
18/12/05 01:38:41 INFO mapreduce.Job: map 50% reduce 0%
18/12/05 01:38:42 INFO mapreduce.Job: map 100% reduce 0%
18/12/05 01:38:43 INFO mapreduce.Job: Job job_1543937863898_0012 completed successfully
18/12/05 01:38:43 INFO mapreduce.Job: Counters: 30
File System Counters
FILE: Number of bytes read=0
FILE: Number of bytes written=579672
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=462
HDFS: Number of bytes written=472
HDFS: Number of read operations=16
HDFS: Number of large read operations=0
HDFS: Number of write operations=8
Job Counters
Launched map tasks=4
Other local map tasks=4
Total time spent by all maps in occupied slots (ms)=12193
Total time spent by all reduces in occupied slots (ms)=0
Total time spent by all map tasks (ms)=12193
Total vcore-seconds taken by all map tasks=12193
Total megabyte-seconds taken by all map tasks=12485632
Map-Reduce Framework
Map input records=14
Map output records=14
Input split bytes=462
Spilled Records=0
Failed Shuffles=0
Merged Map outputs=0
GC time elapsed (ms)=220
CPU time spent (ms)=3970
Physical memory (bytes) snapshot=778035200
Virtual memory (bytes) snapshot=11221729280
Total committed heap usage (bytes)=676855808
File Input Format Counters
Bytes Read=0
File Output Format Counters
Bytes Written=472
18/12/05 01:38:43 INFO mapreduce.ImportJobBase: Transferred 472 bytes in 21.1684 seconds (22.2974 bytes/sec)
18/12/05 01:38:43 INFO mapreduce.ImportJobBase: Retrieved 14 records.
18/12/05 01:38:43 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM emp AS t LIMIT 1
18/12/05 01:38:43 ERROR util.SqlTypeMap: It seems like you are looking up a column that does not
18/12/05 01:38:43 ERROR util.SqlTypeMap: exist in the table. Please ensure that you’ve specified
18/12/05 01:38:43 ERROR util.SqlTypeMap: correct column names in Sqoop options.
18/12/05 01:38:43 ERROR tool.ImportTool: Imported Failed: column not found: EMPNO

导入后无数据,需要手动刷新 MSCK REPAIR TABLE d5_emp_test_p;
Time taken: 0.224 seconds
hive (default)> select * from d5_emp_test_p;
OK
Time taken: 0.267 seconds
hive (default)> select * from d5_emp_test_p;
OK
Time taken: 0.258 seconds
hive (default)> MSCK REPAIR TABLE d5_emp_test_p;
OK
Partitions not in metastore: d5_emp_test_p:pt=ruoze
Repair: Added partition to metastore d5_emp_test_p:pt=ruoze
Time taken: 0.217 seconds, Fetched: 2 row(s)
hive (default)> select * from d5_emp_test_p;
OK
7369 SMITH CLERK 800.0 NULL ruoze
7499 ALLEN SALESMAN 1600.0 300.0 ruoze
7521 WARD SALESMAN 1250.0 500.0 ruoze
7566 JONES MANAGER 2975.0 NULL ruoze
7654 MARTIN SALESMAN 1250.0 1400.0 ruoze
7698 BLAKE MANAGER 2850.0 NULL ruoze
7782 CLARK MANAGER 2450.0 NULL ruoze
7788 SCOTT ANALYST 3000.0 NULL ruoze
7839 KING PRESIDENT 5000.0 NULL ruoze
7844 TURNER SALESMAN 1500.0 0.0 ruoze
7876 ADAMS CLERK 1100.0 NULL ruoze
7900 JAMES CLERK 950.0 NULL ruoze
7902 FORD ANALYST 3000.0 NULL ruoze
7934 MILLER CLERK 1300.0 NULL ruoze
Time taken: 0.311 seconds, Fetched: 14 row(s)
hive (default)>