sqoop将mysql表导入hdfs
配置sqoop环境变量,不配置的话进入sqoop/bin目录
export SQOOP_HOME=/opt/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
启动hdfs和yarn
start-dfs.sh
start-yarn.sh
sqoop help
codegen Generate code to interact with database records
create-hive-table Import a table definition into Hive
eval Evaluate a SQL statement and display the results
export Export an HDFS directory to a database table
help List available commands
import Import a table from a database to HDFS
import-all-tables Import tables from a database to HDFS
import-mainframe Import datasets from a mainframe server to HDFS
job Work with saved jobs
list-databases List available databases on a server
list-tables List available tables in a database
merge Merge results of incremental imports
metastore Run a standalone Sqoop metastore
version Display version information
sqoop import --connect jdbc:mysql://cdh3:3306/gmall --username root --password 123456 --table user_info
--columns id,name --where "id >=10 and id<=30" --target-dir /test --delete-target-dir --num-mappers 2
--fields-terminated-by '\t' --split-by id
1 --connect jdbc:mysql://cdh3:3306/gmall 数据库的连接信息
2 --username root --password 123456 数据库的用户名密码
3 --table user_info 数据库的表
4 --columns id,name 数据库列,指定导入哪几列
5 --where "id >=10 and id<=30 " 指定条件,导入id大于10小于30的数据,
4和5也可以写为 --query "select id,name from user_info where id>=10 an id<=30 and ‘$CONDITIONS’ " $CONDITIONS是过滤条件如过没有where语句,直接写上 $CONDITIONS。
6 --target-dir /test 指定hdfs目录
7 --delete-target-dir 如果输出路径存在则删除
以下是sqoop优化
8 --num-mappers 指定导数据时mapper的个数 默认4个map
9 --fields-terminated-by ‘\t’ 分隔符
10 --split-by id 按id进行分片
[root@cdh3 db_log]# hdfs dfs -ls /test
Found 3 items
-rw-r--r-- 2 root supergroup 0 2021-06-06 16:15 /test/_SUCCESS
-rw-r--r-- 2 root supergroup 125 2021-06-06 16:15 /test/part-m-00000
-rw-r--r-- 2 root supergroup 134 2021-06-06 16:15 /test/part-m-00001
导入hdfs后可以从hdfs导入hive等其他组件进行分析