在Fedora18上配置个人的Hadoop开发环境

本文介绍如何在Fedora 18上配置个人Hadoop开发环境,包括无密码SSH配置、依赖安装、Java环境设置、编译Hadoop源码、沙盒环境创建及配置文件调整等步骤。

              在Fedora18上配置个人的Hadoop开发环境

1.    背景

文章中讲述了类似于“personalcondor”的一种“personal hadoop” 配置法。主要的目的是配置文件和日志文件有一个单一的源,

可以用软连接到开发生成的二进制库,这样就可以在所生成二进制库更新的时候维护其他的数据和配置项。

2.    用户案例

1.  比较不用改变现有系统中安装软件的情况下,在本地的沙盒环境中做测试

2.  单一源的配置文件盒日志文件

3.    参考

网页:

http://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment

http://vichargrave.com/create-a-hadoop-build-and-development-environment-for-hadoop/

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

http://wiki.apache.org/hadoop/

http://docs.hortonworks.com/CURRENT/index.htm#Appendix/Configuring_Ports/HDFS_Ports.htm

书籍:

Hadoop “TheDefinitive Guide”

4.    免责声明

1.  当前是在使用存在maven依赖的非本地开发步骤,详细信息在本地的包中,请查看:https://fedoraproject.org/wiki/Features/Hadoop

2 . 单节点环境搭建步骤在下边列出

5.    先决条件

1.      配置没有密码的ssh

yum install openssh openssh-clients openssh-server

# generate a public/private key, if you don't already have one

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

chmod 600 ~/.ssh/*

 

# testing ssh:

ps -ef | grep sshd     # verify sshd is running

ssh localhost          # accept the certification when prompted

sudo passwd root       # Make sure the root has a password

2.        安装其它依赖包

yum install cmake git subversion dh-make ant autoconf automake sharutils libtool asciidoc xmlto curl protobuf-compiler gcc-c++ 

3.        安装java和开发环境

yum install java-1.7.0-openjdk java-1.7.0-openjdk-devel java-1.7.0-openjdk-javadoc *maven*

修改.bashrc文件信息

 export JVM_ARGS="-Xmx1024m -XX:MaxPermSize=512m"
 export MAVEN_OPTS="-Xmx1024m -XX:MaxPermSize=512m"

注意:以上的配置用在F18的OpenJDK7上,可以通过以下命令来测试当前环境配置是否成功。

mvn install -Dmaven.test.failure.ignore=true

6.     搭建“personal-hadoop“

1.        下载编译hadoop

git clone git://git.apache.org/hadoop-common.git
cd hadoop-common
git checkout -b branch-2.0.4-alpha origin/branch-2.0.4-alpha
mvn clean package -Pdist -DskipTests

2.        创建沙盒环境

在这个配置中我们默认到/home/tstclair

cd ~
mkdir personal-hadoop
cd personal-hadoop
mkdir -p conf data name logs/yarn
ln -sf <your-git-loc>/hadoop-dist/target/hadoop-2.0.4-alpha home

3.        重写你的环境变量

附加以下信息到家目录的.bashrc文件中

# Hadoop env override:

export HADOOP_BASE_DIR=${HOME}/personal-hadoop

export HADOOP_LOG_DIR=${HOME}/personal-hadoop/logs

export HADOOP_PID_DIR=${HADOOP_BASE_DIR}

export HADOOP_CONF_DIR=${HOME}/personal-hadoop/conf

export HADOOP_COMMON_HOME=${HOME}/personal-hadoop/home

export HADOOP_HDFS_HOME=${HADOOP_COMMON_HOME}

export HADOOP_MAPRED_HOME=${HADOOP_COMMON_HOME}

# Yarn env override:

export HADOOP_YARN_HOME=${HADOOP_COMMON_HOME}

export YARN_LOG_DIR=${HADOOP_LOG_DIR}/yarn

#classpath override to search hadoop loc

export CLASSPATH=/usr/share/java/:${HADOOP_COMMON_HOME}/share

#Finally update your PATH

export PATH=${HADOOP_COMMON_HOME}/bin:${HADOOP_COMMON_HOME}/sbin:${HADOOP_COMMON_HOME}/libexec:${PATH}

4.        验证以上步骤

source ~/.bashrc
which hadoop    # verify it should be ${HOME}/personal-hadoop/home/bin  
hadoop -help    # verify classpath is correct.

5.        创建初始化单一源的配置文件

拷贝默认的配置文件

cp ${HADOOP_COMMON_HOME}/etc/hadoop/* ${HADOOP_BASE_DIR}/conf

更新你的hdfs-site.xml文件:

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

 

http://www.apache.org/licenses/LICENSE-2.0

 

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

 

<!-- Override tstclair with your home directory -->

 

<configuration>

 

    <property>

        <name>fs.default.name</name>

        <value>hdfs://localhost/</value>

    </property>

    <property>

        <name>dfs.name.dir</name>

        <value>file:///home/tstclair/personal-hadoop/name</value>

    </property>

    <property>

        <name>dfs.http.address</name>

        <value>0.0.0.0:50070</value>

    </property>

    <property>

        <name>dfs.data.dir</name>

        <value>file:///home/tstclair/personal-hadoop/data</value>

    </property>

    <property>

        <name>dfs.datanode.address</name>

        <value>0.0.0.0:50010</value>

    </property>

    <property>

        <name>dfs.datanode.http.address</name>

        <value>0.0.0.0:50075</value>

    </property>

    <property>

        <name>dfs.datanode.ipc.address</name>

        <value>0.0.0.0:50020</value>

    </property>

 

</configuratio

更新mapred-site.xml文件

<?xml version="1.0" encoding="UTF-8"?>

<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

 

http://www.apache.org/licenses/LICENSE-2.0

 

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

 

<!-- Update or append these vars -->

 

<configuration>

    <property>

        <name>mapreduce.cluster.temp.dir</name>

        <value>

        </value>

        <description>No description</description>

        <final>true</final>

    </property>

    <property>

        <name>mapreduce.cluster.local.dir</name>

        <value>

        </value>

        <description>No description</description>

        <final>true</final>

    </property>

</configuration>

最后更新yarn-site.xml文件

<?xml version="1.0"?>

<!--

Licensed under the Apache License, Version 2.0 (the "License");

you may not use this file except in compliance with the License.

You may obtain a copy of the License at

 

http://www.apache.org/licenses/LICENSE-2.0

 

Unless required by applicable law or agreed to in writing, software

distributed under the License is distributed on an "AS IS" BASIS,

WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.

See the License for the specific language governing permissions and

limitations under the License. See accompanying LICENSE file.

-->

 

<configuration>

    <!-- Site specific YARN configuration properties -->

    <property>

        <name>yarn.resourcemanager.resource-tracker.address</name>

        <value>localhost:8031</value>

        <description>host is the hostname of the resource manager and

                    port is the port on which the NodeManagers contact the Resource Manager.

        </description>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.address</name>

        <value>localhost:8030</value>

        <description>host is the hostname of the resourcemanager and port is the port

                     on which the Applications in the cluster talk to the Resource Manager.

        </description>

    </property>

    <property>

        <name>yarn.resourcemanager.scheduler.class</name>

        <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>

        <description>In case you do not want to use the default scheduler</description>

    </property>

    <property>

        <name>yarn.resourcemanager.address</name>

        <value>localhost:8032</value>

        <description>the host is the hostname of the ResourceManager and the port is the port on

                    which the clients can talk to the Resource Manager. </description>

    </property>

    <property>

        <name>yarn.nodemanager.local-dirs</name>

        <value>

        </value>

        <description>the local directories used by the nodemanager</description>

    </property>

    <property>

        <name>yarn.nodemanager.address</name>

        <value>localhost:8034</value>

        <description>the nodemanagers bind to this port</description>

    </property>

    <property>

        <name>yarn.nodemanager.resource.memory-mb</name>

        <value>10240</value>

        <description>the amount of memory on the NodeManager in GB</description>

    </property>

    <property>

        <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce.shuffle</value>

        <description>shuffle service that needs to be set for Map Reduce to run </description>

    </property>

</configuration>

7.    开启单节点的Hadoop集群

格式化namenode

hadoop namenode -format
#verify output is correct.

开启hdfs:

start-dfs.sh

打开浏览器http://localhost:50070,查看是否有一个节点已经被启动

接下来开启yarn

start-yarn.sh

通过查看日志文件来验证是否正常启动

最后通过运行MapReduce任务来检查Hadoop是否正常运行

cd ${HADOOP_COMMON_HOME}/share/hadoop/mapreduce
hadoop jar hadoop-mapreduce-example-2.0.4-alpha.jar randomwriter out

 

 

文章出处:http://timothysc.github.io/blog/2013/04/22/personalhadoop/


分布式微服务企业级系统是一个基于Spring、SpringMVC、MyBatis和Dubbo等技术的分布式敏捷开发系统架构。该系统采用微服务架构和模块化设计,提供整套公共微服务模块,包括集中权限管理(支持单点登录)、内容管理、支付中心、用户管理(支持第三方登录)、微信平台、存储系统、配置中心、日志分析、任务和通知等功能。系统支持服务治理、监控和追踪,确保高可用性和可扩展性,适用于中小型企业的J2EE企业级开发解决方案。 该系统使用Java作为主要编程语言,结合Spring框架实现依赖注入和事务管理,SpringMVC处理Web请求,MyBatis进行数据持久化操作,Dubbo实现分布式服务调用。架构模式包括微服务架构、分布式系统架构和模块化架构,设计模式应用了单例模式、工厂模式和观察者模式,以提高代码复用性和系统稳定性。 应用场景广泛,可用于企业信息化管理、电子商务平台、社交应用开发等领域,帮助开发者快速构建高效、安全的分布式系统。本资源包含完整的源码和详细论文,适合计算机科学或软件工程专业的毕业设计参考,提供实践案例和技术文档,助力学生和开发者深入理解微服务架构和分布式系统实现。 【版权说明】源码来源于网络,遵循原项目开源协议。付费内容为本人原创论文,包含技术分析和实现思路。仅供学习交流使用。
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值