0 Hadoop集群机器配置
公司正式集群机子:内存32G存储24T,另有1块1T的本地盘
namenode和secondery namenode配Xeon E5-2640,24线程
其余节点配Xeon E5-2609,8线程
安装cdh 5u3
1 Hadoop集群安装
1.1 Hadoop环境配置
1.JDK安装配置
1)关闭防火墙:chkconfig iptables off
2)vi /etc/selinux/config 将SELINUX=enforcing改为SELINUX=disabled
3)修改主机名:vi /etc/sysconfig/network
NETWORKING= yes 网络是否可用
HOSTNAME=xxxx xxxx为新设置的主机名
4)确保ssh服务开启,启动ssh命令:servicesshd start
5)从官网下载jdk包,解压出来(tar -xzvf jdk)
6)配置java环境
利用用root用户配置
vi /etc/profile
在文件里添加:
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_71
export HADOOP_HOME=/software/hadoop
export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
保存完文件后,执行source /etc/profile使得配置能够生效
1.2 Hadoop文档配置
1.配置免密码登入设置(包含本机的免密登入)
1)生成密钥对命令 ssh-keygen -t rsa
2)修改hosts文件 vi /etc/hosts,在文件末追加以下内容(都是些host)
3)cat id_rsa.pub >> authorized_keys
scp id_rsa.pubhd0001:/home/hdfs/.ssh/id_rsahd0002.pub
cat id_rsahd0002.pub>> authorized_keys
4)执行ssh hd0001验证免密码登入是否成功
2.hadoop相关文档修改(可参见官网说明)
1)hadoop-env.sh
2)core-site.xml
3)hdfs-site.xml
4)mapred-site.xml
5)slaves
1.3 监控地址
http://jobtracker:8088/cluster/apps
http://namenode:50070/dfshealth.html
2 Hadoop增加新节点
2.1 格式化新盘
在每台机器上执行:
mkfs -t ext4 /dev/sdb
mkfs -t ext4 /dev/sdc
mkfs -t ext4 /dev/sdd
mkfs -t ext4 /dev/sde
mkfs -t ext4 /dev/sdf
mkfs -t ext4 /dev/sdg
mkfs -t ext4 /dev/sdh
mkfs -t ext4 /dev/sdi
mkfs -t ext4 /dev/sdj
mkfs -t ext4 /dev/sdk
mkfs -t ext4 /dev/sdl
mkfs -t ext4 /dev/sdm
mkdir /opt/data01
mkdir /opt/data02
mkdir /opt/data03
mkdir /opt/data04
mkdir /opt/data05
mkdir /opt/data06
mkdir /opt/data07
mkdir /opt/data08
mkdir /opt/data09
mkdir /opt/data10
mkdir /opt/data11
mkdir /opt/data12
2.2 挂载新盘
vi /etc/fstab
增加以下内容并保存:
/dev/sdb /opt/data01 ext4 defaults 0 0
/dev/sdc /opt/data02 ext4 defaults 0 0
/dev/sdd /opt/data03 ext4 defaults 0 0
/dev/sde /opt/data04 ext4 defaults 0 0
/dev/sdf /opt/data05 ext4 defaults 0 0
/dev/sdg /opt/data06 ext4 defaults 0 0
/dev/sdh /opt/data07 ext4 defaults 0 0
/dev/sdi /opt/data08 ext4 defaults 0 0
/dev/sdj /opt/data09 ext4 defaults 0 0
/dev/sdk /opt/data10 ext4 defaults 0 0
/dev/sdl /opt/data11 ext4 defaults 0 0
/dev/sdm /opt/data12 ext4 defaults 0 0
reboot (该命令重启机器,注意)
2.3 建需要的用户和组
useradd -d /home/hdfs -m hdfs
还有其他你自己需要的
2.4 建其他目录
chown hdfs /opt/data01
chown hdfs /opt/data02
chown hdfs /opt/data03
chown hdfs /opt/data04
chown hdfs /opt/data05
chown hdfs /opt/data06
chown hdfs /opt/data07
chown hdfs /opt/data08
chown hdfs /opt/data09
chown hdfs /opt/data10
chown hdfs /opt/data11
chown hdfs /opt/data12
mkdir -m 750 /opt/data01/hdfs
chown hdfs:hdfs /opt/data01/hdfs
mkdir -m 775 /opt/data01/mapred
chown hdfs:hdfs /opt/data01/mapred
mkdir -m 750 /opt/data02/hdfs
chown hdfs:hdfs /opt/data02/hdfs
mkdir -m 775 /opt/data02/mapred
chown hdfs:hdfs /opt/data02/mapred
mkdir -m 750 /opt/data03/hdfs
chown hdfs:hdfs /opt/data03/hdfs
mkdir -m 775 /opt/data03/mapred
chown hdfs:hdfs /opt/data03/mapred
mkdir -m 750 /opt/data04/hdfs
chown hdfs:hdfs /opt/data04/hdfs
mkdir -m 775 /opt/data04/mapred
chown hdfs:hdfs /opt/data04/mapred
mkdir -m 750 /opt/data05/hdfs
chown hdfs:hdfs /opt/data05/hdfs
mkdir -m 775 /opt/data05/mapred
chown hdfs:hdfs /opt/data05/mapred
mkdir -m 750 /opt/data06/hdfs
chown hdfs:hdfs /opt/data06/hdfs
mkdir -m 775 /opt/data06/mapred
chown hdfs:hdfs /opt/data06/mapred
mkdir -m 750 /opt/data07/hdfs
chown hdfs:hdfs /opt/data07/hdfs
mkdir -m 775 /opt/data07/mapred
chown hdfs:hdfs /opt/data07/mapred
mkdir -m 750 /opt/data08/hdfs
chown hdfs:hdfs /opt/data08/hdfs
mkdir -m 775 /opt/data08/mapred
chown hdfs:hdfs /opt/data08/mapred
mkdir -m 750 /opt/data09/hdfs
chown hdfs:hdfs /opt/data09/hdfs
mkdir -m 775 /opt/data09/mapred
chown hdfs:hdfs /opt/data09/mapred
mkdir -m 750 /opt/data10/hdfs
chown hdfs:hdfs /opt/data10/hdfs
mkdir -m 775 /opt/data10/mapred
chown hdfs:hdfs /opt/data10/mapred
mkdir -m 750 /opt/data11/hdfs
chown hdfs:hdfs /opt/data11/hdfs
mkdir -m 775 /opt/data11/mapred
chown hdfs:hdfs /opt/data11/mapred
mkdir -m 750 /opt/data12/hdfs
chown hdfs:hdfs /opt/data12/hdfs
mkdir -m 775 /opt/data12/mapred
chown hdfs:hdfs /opt/data12/mapred
2.5修改主机名
vi /etc/sysconfig/network
2.6 装java(JDK)
yum install -y java-1.7.0-openjdk-devel
2.7从老节点 scp以下文件或目录到新机器上
/software/hadoop-versions/hadoop-2.5.0-cdh5.3.0/
/software/hadoop-versions/hive-0.13.1-cdh5.3.0/
2.8建其他空文件和软链
空文件夹:权限全部hdfs:hdfs 775
/software/hadoop-data/
/software/hadoop-dfs/
/software/hadoop-tmp
软链:
ln -s/software/hadoop-versions/hadoop-2.5.0-cdh5.3.0 /software/hadoop
ln -s /software/hadoop-versions/hive-0.13.1-cdh5.3.0 /software/hive
2.9 改 /etc/profile
加入
source /software/hadoop_alias
然后
source /etc/profile
2.10所有节点包括新老dn和namenode,jobtracker,改hosts
vi /etc/hosts
………………
2.11 配置免密
参考1.2节
3 Hadoop集群操作常用命令
3.1 HDFS命令
参见:
http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/FileSystemShell.html
常用的有-ls,-cp,-mv,-rm,-chown,-text等几个
不要再用“hadoop dfs”这种格式了,在以后版本中将不支持。
3.2 YARN命令
参见:
http://hadoop.apache.org/docs/r2.5.2/hadoop-yarn/hadoop-yarn-site/YarnCommands.html
常用的有yarn jar,yarn application等
杀任务:yarn application -kill APPID
挪任务槽位:yarn application -movetoqueue APPID -queue QUEUENAME
3.3 其他常用
hadoop刷新配置:
hadoop dfsadmin -refreshNodes
yarn rmadmin -refreshNodes
启/停单个datanode:
cd /software/hadoop/sbin
./yarn-daemon.sh start nodemanager (停是stop)
./hadoop-daemon.sh start datanode(停是stop)
4 Hive安装
4.1 创建Hive库和Hive用户
1)建立数据库hive
mysql>create database hive;
2)创建hive用户
mysql>create user hive IDENTIFIEDBY 'hive';
mysql>use hive;
mysql>grant all on hive.* to'hive'@'hd0006' identified by 'hive';
mysql>flush privileges;
4.2 Hive文档配置
1)添加环境变量/etc/profile
exportHIVE_HOME=/software/hive
exportPATH=$HIVE_HOME/bin:$PATH:.
2)配置Hive
改hive-env.sh和hive-site.xml,参见官方文档
3)启动Hive
•命令行键入
bin/hive--service metastore &
bin/hive--service hiveserver2 &
>CREATEDATABASE jyfx WITH DBPROPERTIES ('creator' = 'jyfx', 'date' = '2015-04-12');
>grantall on database jyfx to user jyfx;
>createtable dual(a string);
>dfs-put abc.txt /group/jyfx/hive/jyfx.db/dual;
>select* from dual;
Ok
helloworld
5 Sqoop安装
5.1 Sqoop环境变量配置
1)vi/etc/profile
export SQOOP_HOME=/software/sqoop
export PATH=$SQOOP_HOME/bin:$PATH
:wq
2)source/etc/profile
5.2 Sqoop文档配置
1)sqoop_env.sh
2)sqoop-site.xml
3)oraoop-site.xml [oracle数据库专用]
4)sqoop测试
5.3 Sqoop升级和回滚
1)软链切换到2.0【在集群的每台机子上执行】
rm /software/sqoop
ln -s /software/hadoop-versions/sqoop-1.4.5-cdh5.3.0 /software/sqoop
rm /software/sqoop
ln -s /software/hadoop-versions/sqoop-1.3.0-cdh3u6 /software/sqoop
6 Hadoop升级
之前从3u6直接升级到5u3了(比较折腾,官方推荐是3升4升5,但我们一步到位搞定了),从略吧
7 Hive升级
也从略
8 Hue安装
版本:hue-3.7.0-cdh5.3.0
8.1 Hue安装
1)解开压缩包:tar xzvf hue-3.7.0-cdh5.3.0.tar.gz
2)make安装程序
cd /software/hue/
make apps
8.2 Hue.ini配置
8.3 Hue启用
1)hive服务进程启动
nohup /software/hive/bin/hive--service metastore > metastore.log &
nohup /software/hive/bin/hive--service hiveserver2 > hiveserver2.log &
2)hue进程启动
ps -ef |grep hue|awk '{print $2}' |xargs kill -9
nohup /software/hue/build/env/bin/supervisor> supervisor.log &
3)打开web应用界面url
http://10.100.10.2:8888/accounts/login/?next=/,添加superuser
9 Mahout安装
版本:mahout-0.9-cdh5.3.0
9.1 Mahout安装
tar xzvf mahout-0.9-cdh5.3.0.tar.gz
9.2 Mahout环境变量配置
vi /etc/profile
export MAHOUT_HOME=/software/mahout
export MAHOUT_CONF_DIR=$MAHOUT_HOME/conf
exportPATH=$HADOOP_HOME/bin:$HIVE_HOME/bin:$MAHOUT_HOME/conf:$MAHOUT_HOME/bin:$PATH
source /etc/profile
9.3 Mahout启用
安装成功
10 Impala安装
参见另一个文档《Impala安装手册》