java 环境配置
Hadoop2
- 安装包hadoop-2.6.1.tar.gz
解压后进入目录/usr/local/src/hadoop-2.6.1/etc/hadoop - hadoop-env.sh配置JAVA_HOME
export JAVA_HOME=/usr/local/src/jdk1.7.0_45
- yarn-env.sh配置JAVA_HOME
export JAVA_HOME=/usr/local/src/jdk1.7.0_45
- slaves配置从节点
slave1
slave2
- core-site.xml配置下面配置
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000/</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>file:/usr/local/src/hadoop-2.6.1/tmp</value>
</property>
</configuration>
- 在HAOOP_HOME创建目录:
mkdir tmp
mkdir -p dfs/name
mkdir -p dfs/data - 修改hdfs-site.xml
<configuration>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:9001</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/usr/local/src/hadoop-2.6.1/dfs/name</value>
</property>
<property>
<name>dfs.namenode.data.dir</name>
<value>file:/usr/local/src/hadoop-2.6.1/dfs/data</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
- cp mapred-site.xml.template mapred-site.xml
• mapred-site.xml配置
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
- yarn-site.xml配置
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8035</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>master:8033</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>master:8088</value>
</property>
</configuration>
- 将配置分发到从节点,在HADOOP_HOME下执行即可启动
./bin/hadoop namenode -format #格式化
./sbin/start-dfs.sh #这两句等同于start-all.sh
./sbin/start-yarn.sh
spark 搭建(在yarn集群之上)
- 安装包spark-1.6.0-bin-hadoop2.6.tgz,解压后进入目录
- 修改conf/spark-env.sh
export SCALA_HOME=/usr/local/src/scala-2.11.4
export JAVA_HOME=/usr/local/src/jdk1.7.0_45
export HADOOP_HOME=/usr/local/src/hadoop-2.6.1
export HADOOP_CONF=$HADOOP_HOME/etc/hadoop
SPARK_MASTER_IP=master
SPARK_LOCAL_DIRS=/usr/local/src/spark-1.6.0-bin-hadoop2.6
SPARK_DRIVER_MEMORY=1G
- cp slaves.template slaves, 修改内容为:
slave1
slave2
- 将配置分发到从节点
- 启动Spark ./sbin/start-all.sh
spark开发环境搭建
安装包sbt-0.13.15.tgz,解压后进入相应目录
1. 修改~/.bashrc
# sbt config
export SBT_HOME=/usr/local/src/sbt
export PATH=$PATH:$SBT_HOME/bin
- 在开发目录内
•[root@master spark_test]# mkdir -p spark_wordcount/lib
•[root@master spark_test]# mkdir -p spark_wordcount/project
•[root@master spark_test]# mkdir -p spark_wordcount/src
•[root@master spark_test]# mkdir -p spark_wordcount/target
•[root@master spark_test]# mkdir -p spark_wordcount/src/main/scala
- 拷贝spark安装包lib目录下的spark-assembly-1.6.0-hadoop2.6.0.jar到spark_wordcount/lib目录下
- 在目录spark_wordcount下创建build.sbt文件,写入内容
name := "WordCount"
version := "1.6.0"
scalaVersion := "2.11.4"
- 在spark_wordcount目录下执行编译:
sbt compile
,会下载很多jar包,时间较长 - 开发完成后执行打包命令:
sbt package