1.环境准备
maven(下载安装,配置环境变量,修改sitting.xml加阿里云镜像)
gcc-c++
zlib-devel
autoconf
automake
libtool
使用如下命令:
sudo yum -y install gcc-c++ lzo-devel zlib-devel autoconf automake libtool
2.下载、安装并编译LZO
wget http://www.oberhumer.com/opensource/lzo/download/lzo-2.10.tar.gz #可能下不了,可以直接去网上下载安装包
tar -zxvf lzo-2.10.tar.gz
cd lzo-2.10
export CFLAGS=-m64
./configure -prefix=/soft/home/hadoop/lzo/ #编译后的输出路径,事先在hadoop根目录下建立lzo文件夹
make
make install
3. 编译hadoop-lzo源码
下载hadoop-lzo的源码:
下载地址:https://github.com/twitter/hadoop-lzo/archive/master.zip
解压之后,修改pom.xml:
<hadoop.current.version>2.7.6</hadoop.current.version>
声明两个临时环境变量:
export C_INCLUDE_PATH=/usr/local/hadoop/lzo/include
export LIBRARY_PATH=/usr/local/hadoop/lzo/lib
进入hadoop-lzo-master,执行maven编译命令:
mvn package -Dmaven.test.skip=true
进入target,将hadoop-lzo-0.4.20-SNAPSHOT.jar放到hadoop的classpath下:
如${HADOOP_HOME}/share/hadoop/common
这里为方便,提供已编译好的jar包,直接放到 common 目录下即可使用省却以上编译hadoop-lzo源码这一块的步骤:
https://download.youkuaiyun.com/download/a377987399/11448978
4.配置hadoop
修改 core-site.xml 添加如下配置:
<property> #已有这项配置的只需添加以下头两项
<name>io.compression.codecs</name>
<value>
com.hadoop.compression.lzo.LzoCodec,
com.hadoop.compression.lzo.LzopCodec,
org.apache.hadoop.io.compress.GzipCodec,
org.apache.hadoop.io.compress.DefaultCodec,
org.apache.hadoop.io.compress.BZip2Codec
</value>
</property>
<property>
<name>io.compression.codec.lzo.class</name>
<value>com.hadoop.compression.lzo.LzoCodec</value>
</property>
修改 mapred-site.xml 添加如下配置:
<property>
<name>mapred.child.env</name>
<value>LD_LIBRARY_PATH=/soft/home/hadoop/lzo/lib</value>
</property>
<property>
<name>mapreduce.admin.user.env</name>
<value>LD_LIBRARY_PATH=/soft/home/hadoop/lib/native:/soft/home/hadoop/lzo/lib</value>
</property>
5.使用hive验证lzo
创建lzo表:
CREATE TABLE lzo (
ip STRING,
time STRING,
request STRING,
status STRING,
size STRING,
rt STRING,
referer STRING,
agent STRING,
forwarded String
)
row format delimited
fields terminated by '\t'
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat";
下载lzo处理数据:
sudo yum install lzop
创建数据文件 aa.log 内容如下多行
xxx.xxx.xx.xxx [23/Dec/2015:23:22:38 +0800] "GET /ClientGetResourceDetail.action?id=318880&token=Ocm HTTP/1.1" 200 199 0.008 "xxx.com" "Android4.1.2/LENOVO/Lenovo A706/ch_lenovo/80" "-"
使用命令 lzop aa.log 会在同一目录下得到 aa.log.lzo
验证:
导入数据进表,先进入hive 客户端输入 load data local inpath '/home/supdev/aa.log.lzo' into table lzo;
查询该表
参考:
https://www.cnblogs.com/allthewayforward/p/11131218.html
https://blog.youkuaiyun.com/joseph_happy/article/details/50374057