1. 下载安装
下载地址:
http://archive.cloudera.com/cdh5/cdh/5/sqoop-1.4.6-cdh5.9.3.tar.gz
下载 sqoop-1.4.6-cdh5.9.3.tar.gz
解压后安装在/home/hadoop/tools/文件夹下
2. 修改环境变量
sudo vim /etc/profile
#
export PYTHONPATH=/home/hadoop/tools/spark2/python
export PYSPARK_PYTHON=python3
# pyspark
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS='notebook'
# SQOOP
export SQOOP_HOME=/home/hadoop/tools/sqoop
export PATH=$PATH:$SQOOP_HOME/bin
export HIVE_CONF_DIR=/home/hadoop/tools/hive/conf
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$HIVE_HOME/lib/
3. 修改sqoop-env.sh:
#Set the path for where zookeper config dir is
#export ZOOCFGDIR=
export HADOOP_COMMON_HOME=/home/hadoop/tools/hadoop3
export HADOOP_MAPRED_HOME=/home/hadoop/tools/hadoop3
export HIVE_HOME=/home/hadoop/tools/hive
4. 修改bin/configure-sqoop:注释掉HCAT_HOME、ACCUMULO_HOME、ZOOKEEPER_HOME的检查
## Moved to be a runtime check in sqoop.
#if [ ! -d "${HCAT_HOME}" ]; then
# echo "Warning: $HCAT_HOME does not exist! HCatalog jobs will fail."
# echo 'Please set $HCAT_HOME to the root of your HCatalog installation.'
#fi
#if [ ! -d "${ACCUMULO_HOME}" ]; then
# echo "Warning: $ACCUMULO_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ACCUMULO_HOME to the root of your Accumulo installation.'
#fi
#if [ ! -d "${ZOOKEEPER_HOME}" ]; then
# echo "Warning: $ZOOKEEPER_HOME does not exist! Accumulo imports will fail."
# echo 'Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.'
#fi
5. 将mysql-connector-java-8.0.11.zip复制到sqoop的lib下面
6. 安装mysql8
7. 将mysql导入HDFS
sqoop import --connect jdbc:mysql://192.168.1.15:3306/agydata?zeroDateTimeBehavior=round --username root --password hadoop --query 'select * from student where $CONDITIONS' --target-dir /Hadoop/Input/student -m 3 --fields-terminated-by '\t' --split-by 'id'
-m 表示启动N个map来并行导入数据,默认是4个,最好不要将数字设置为高于集群的节点数
默认放在/user/用户名/
8. 词频统计测试
wordcountpy
# -*- coding:utf-8 -*-
from pyspark import SparkContext, SparkConf
inputFile = 'hdfs://master:9000/Hadoop/Input/wordcount/part-m*' #测试文档
outputFile = 'hdfs://master:9000/Hadoop/Output/wordcount' #结果目录
# 也可以用于web上监控
appName = "wordcount"
# 服务器名可以使用ip
master = "spark://master:7077"
conf = SparkConf().setAppName(appName).setMaster(master)
sc = SparkContext(conf=conf)
text_file = sc.textFile(inputFile)
counts = text_file.flatMap(lambda line: line.split(',')).map(lambda word: (word, 1)).reduceByKey(lambda a, b: a+b)
counts.saveAsTextFile(outputFile)
#!/bin/bash
echo -e "\033[31m ========Runing SQOOP to HDFS !!!======== \033[0m"
sqoop import --connect jdbc:mysql://192.168.1.15:3306/agydata?zeroDateTimeBehavior=round --username root --password hadoop --columns "wordcountcol" --delete-target-dir --target-dir /Hadoop/Input/wordcount --mapreduce-job-name mysql2hdfs --table wordcount -m 3
echo -e "\033[31m ========Readint HDFS Data and Run wordcount.py Now !!!======== \033[0m"
hadoop fs -test -e /Hadoop/Output/wordcount
if [ $? -eq 0 ];then
echo -e "\033[31m ========Deleteing File directory !!!======== \033[0m"
hadoop fs -rm -r /Hadoop/Output/wordcount
fi
echo -e "\033[31m ========Runing Wordcount Model !!!======== \033[0m"
export CURRENT=/home/hadoop/work
$SPARK_HOME/bin/spark-submit $CURRENT/wordcount.py
echo -e "\033[31m ========Result Output !!!======== \033[0m"
hadoop fs -cat /Hadoop/Output/wordcount/*
本文介绍了如何下载和安装SQOOP 1.4.6版本,并详细说明了配置过程,包括环境变量设置、sqoop-env.sh文件修改及运行测试等步骤。此外,还提供了从MySQL数据库导入数据到HDFS的具体示例。
1282

被折叠的 条评论
为什么被折叠?



