1. hive是基于hdfs的一个数据仓库,所以需要hadoop的环境,hadoop怎么搭建可以参考另一篇博文:http://blog.youkuaiyun.com/jthink_/article/details/38622297
2. 下载hive(hive-0.11.0.tar.gz)放到合适的位置(注意这里放在bg01主机上):
如我的放在:/usr/local/bg文件夹下
3. 修改配置:
拷贝hive-default.xml.template,名字改为: hive-default.xml,这是hive的默认配置
建立新文件:hive-site.xml, 这个里面的配置会覆盖hive-default.xml中的配置,调参在这个文件中改相应配置
拷贝hive-env.sh.template,名字改为:hive-env.sh,内容为:
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE
# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
# if [ -z "$DEBUG" ]; then
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
# else
# export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
# fi
# fi
# The heap size of the jvm stared by hive shell script can be controlled via:
#
export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions.
# By default hive shell scripts use a heap size of 256 (MB). Larger heap size would also be
# appropriate for hive server (hwi etc).
# Set HADOOP_HOME to point to a specific hadoop install directory
HADOOP_HOME=/usr/local/bg/hadoop-1.2.1
# Hive Configuration Directory can be controlled by:
export HIVE_CONF_DIR=/usr/local/bg/hive-0.11.0/conf
# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
export HIVE_AUX_JARS_PATH=/usr/local/bg/hive-0.11.0/lib
4. hive的元数据需要放在传统的RDBMS中,这里选择的mysql,先安装mysql
sudo apt-get install mysql-server
修改my.cnf文件:注释掉bind-address=127.0.0.1这行,记住是用#号注释
所以我们的hive-site.xml的配置为:
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/usr/local/bg/hive-0.11.0/log</value>
<description>
Location of Hive run time structured log file
</description>
</property>
<property>
<name>hive.exec.scratchdir</name>
<value>/tmp/hive-${user.name}</value>
<description>Scratch space for Hive jobs</description>
</property>
<property>
<name>hive.metastore.local</name>
<value>true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>root</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>root</value>
</property>
</configuration>
/usr/local/bg/hive-0.11.0/log这个文件夹需要自己建
对了,还要下载mysql的驱动到hive的lib文件夹中
5. 启动hive
配置下环境变量
sudo vim /etc/profile
# set hive environment
export HIVE_HOME=/usr/local/bg/hive-0.11.0
export PATH=$PATH:$HIVE_HOME/bin
命令就是: hive
show tables;
如果正常就说明配置成功
顺便提及一下,得先启动hadoop