基础环境
centos6.8
hadoop-3.1.1.tar.gz
spark-2.3.0-bin-hadoop2.7.tgz
zookeeper-3.4.9.tar.gz
pip-18.0.tar.gz
setuptools-40.2.0.zip
三台服务器
10.0.0.11 s11
10.0.0.12 s12
10.0.0.13 s13
准备工作
以下操作均使用root用户操作
安装jdk1.8 并配置环境变量
修改 /etc/profile
export JAVA_HOME=/usr/java/jdk1.8.0_121
export JRE_HOME=/usr/java/jdk1.8.0_121/jre/
export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
export PATH=$JAVA_HOME/bin:$PATH
配置scala环境
解压 scala-2.12.6.tgz
重命名为scala
修改 /etc/profile
export SCALA_HOME=/app/appuser/apps/scala
export PATH=$SCALA_HOME/bin:$PATH
配置hosts
在 /etc/hosts 文件中,加入 ip hostname对应关系
配置打开文件数以及线程数
修改/etc/security/limits.conf 文件末尾加入
* soft nofile 65536
* hard nofile 131072
* soft nproc 2048
* hard nproc 4096
修改/etc/security/limits.d/90-nproc.conf
* soft nproc 2048
* hard nproc 4096
升级python到2.7
安装python可能会用到的依赖
yum install -y zlib-devel bzip2-devel openssl-devel xz-libs wget gcc
编译安装 python
./configure
make && make install
解决yum依赖python2.6.6的问题
mv /usr/bin/python /usr/bin/python2.6.6
ln -s /usr/local/bin/python2.7 /usr/bin/python
修改 /usr/bin/yum
#!/usr/bin/python
改为
#!/usr/bin/python2.6.6
安装 setuptools
python setup.py install
装pip
python setup.py install
安装python连接hdfs spark模块
pip install hdfs
pip install pyspark
app用户做如下修改:
appuser用户在三台服务器做两两免认证登录,并且各自登录自己也免认证
安装zookeeper集群
以下如无特殊说明,均使用appuser用户执行
安装hdfs
角色分配
s11 部署 namenode; journalnode; datanode; resourceManager;nodeManager
s12 部署 namenode; journalnode; datanode;nodeManager
s13 部署 ournalnode; datanode;nodeManager
配置文件修改
修改core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or