依然是ubuntu12
第一步安装Sqoop
1,download tarball
2,解压到/usr/local/Sqoop-x.x.x,并重命名为Sqoop
3,设置SQOOP_PATH。
hduser@localhost:/usr/local/sqoop$ export SQOOP_HOME="/usr/local/sqoop"
hduser@localhost:/usr/local/sqoop$ export PATH=$PATH:$SQOOP_HOME/bin
4,检查版本hduser@localhost:/usr/local/sqoop$ sqoop help
提示找不到hadoop,为设置Haoop环境变量。Error: /usr/lib/hadoop does not exist!
hduser@localhost:/usr/local/hadoop$ export HADOOP_HOME="/usr/local/hadoop"
hduser@localhost:/usr/local/hadoop$ export PATH=$PATH:$HADOOP_HOME/bin
依然提示:Error: /usr/lib/hadoop does not exist!
5,在/etc/profile.d文件夹下添加hadoop1.sh和sqoop.sh
在里面分别写
export HADOOP_HOME="/usr/local/hadoop"
export PATH=${HADOOP_HOME}/bin:$PATH
和
export SQOOP_HOME="/usr/local/sqoop"
export PATH=${SQOOP_HOME}/bin:$PATH
然后添加修改和执行权限
sudo chmod a+x hadoop1.sh
sudo chmod a+x sqoop.sh
第二步,下载安装Sql Server Connector和jdbc4
1,从http://www.microsoft.com/en-us/download/details.aspx?displaylang=en&id=21599 下载sqljdbc_3.0.1301.101_enu.tar.gz
2,解压后拷贝sqljdbc4.jar到${SQOOP_HOME}/lib目录下
3,从http://www.microsoft.com/en-us/download/details.aspx?id=27584 下载
sqoop-sqlserver-1.0.tar.gz
3,解压到/usr/local目录下,并改名为mssql
在/etc/profile.d文件夹下添加mssql.sh
添加export MSSQL_CONNECTOR_HOME="/usr/local/mssql"
修改执行权限sudo chmod a+x mssql.sh
设置后需要重启下才有效或者用source命令重新加载修改$ source /etc/profile.d/mssql.sh
hduser@localhost:~$ echo $MSSQL_CONNECTOR_HOME
/usr/local/mssql
4,安装,目录中有install.sh,直接执行。
$ sh install.sh
'SQL Server - Hadoop' Connector Installation completed successfully.
第三步,从Sql Server读写数据
sqoop import --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=sa;password=abcdef;database=dbName' --table tableName -m 3
成功把tableName 表的数据导到 hdoop fs : /user/hduser/result
导入数据到hdfs
sqoop import --connect 'jdbc:sqlserver://<IP>;username=dbuser;password=dbpasswd;database=<DB>' --table <table> --target-dir /path/to/hdfs/dir --split-by <KEY> -m 3
导出数据到表
# sqoop export --connect 'jdbc:sqlserver://<IP>;username=dbuser;password=dbpasswd;database=<DB>' --table=<table> --direct --export-dir /path/from/hdfs/dir
例子:
sqoop import --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=userName;password=abcdef;database=dbName' --table tableName -m 3
sqoop import --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=userName;password=abcdef;database=dbName' --table tableName --columns "LogDate,UBCount" -m 3
sqoop import --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=userName;password=abcdef;database=dbName' --table tableName --columns "LogDate,UBCount" --where "LogDate=20130301" -m 3
sqoop import --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=userName;password=abcdef;database=dbName' --table tableName --columns "LogDate,UBCount" --where "LogDate=20130301" --target-dir /user/hduser/ARC -m 3
必须要有$CONDITIONS和--split-by才能用free-form
sqoop import --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=userName;password=abcdef;database=dbName' --query 'SELECT * FROM tableName WHERE LogDate=20130220 and $CONDITIONS' --split-by "LogDate" --target-dir /user/hduser/freeform
export例子:
表必须存在才能导。默认为insert,如果定义了--update-key,就是update。
sqoop export --connect 'jdbc:sqlserver://xxx.xx.xx.xxx;username=userName;password=abcdef;database=dbName' --table=tableName --export-dir /user/hduser/freeform --direct