hadoop(2)The version hadoop 0.23.0 on Ubuntu

最新推荐文章于 2025-12-02 09:42:37 发布

原创最新推荐文章于 2025-12-02 09:42:37 发布 · 147 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#大数据 #java #ui

Distributed 专栏收录该内容

160 篇文章

订阅专栏

本文介绍如何在Ubuntu上部署Hadoop 0.23.0版本的单节点MapReduce集群，包括解决构建错误、配置核心参数、创建符号链接、启动资源管理器和节点管理器等步骤。

hadoop(2)The version hadoop 0.23.0 on Ubuntu

1. Single Node
Mapreduce Tarball
>mvn clean install -DskipTests
>cd hadoop-mapreduce-project
>mvn clean install assembly:assembly -Pnative -DskipTests=true

error message:
[ERROR] Failed to execute goal org.apache.maven.plugins:maven-assembly-plugin:2.2.1:assembly (default-cli) on project hadoop-mapreduce: Error reading assemblies: No assembly descriptors found. -> [Help 1]

solution:
>mvn package -Pdist -DskipTests=true -Dtar
>vi conf/yarn-env.sh
HADOOP_MAPRED_HOME=/usr/local/hadoop-0.23.0

>vi conf/core-site.xml
<property>
<name>hadoop.tmp.dir</name>
<value>/tmp/hadoop-${user.name}</value>
<description>No description</description>
<final>true</final>
</property>

>vi conf/mapred-site.xml
<property>
<name>mapreduce.cluster.temp.dir</name>
<value>${hadoop.tmp.dir}/mapred/temp</value>
<description>No description</description>
<final>true</final>
</property>
<property>
<name>mapreduce.cluster.local.dir</name>
<value>${hadoop.tmp.dir}/mapred/local</value>
<description>No description</description>
<final>true</final>
</property>

>vi conf/yarn-site.xml
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>0.0.0.0:8025</value>
<description>host is the hostname of the resource manager and
port is the port on which the NodeManagers contact the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>0.0.0.0:8030</value>
<description>host is the hostname of the resourcemanager and port is the port
on which the Applications in the cluster talk to the Resource Manager.
</description>
</property>
<property>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
<description>In case you do not want to use the default scheduler</description>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>0.0.0.0:8040</value>
<description>the host is the hostname of the ResourceManager and the port is the port on
which the clients can talk to the Resource Manager. </description>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/tmp/nm-local-dir</value>
<description>the local directories used by the nodemanager</description>
</property>
<property>
<name>yarn.nodemanager.address</name>
<value>0.0.0.0:0</value>
<description>the nodemanagers bind to this port</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>8192</value>
<description>the amount of memory on the NodeManager in GB</description>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
<description>directory on hdfs where the application logs are moved to </description>
</property>
<property>
<name>yarn.nodemanager.log-dirs</name>
<value>/tmp/logs</value>
<description>the directories used by Nodemanagers as log directories</description>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce.shuffle</value>
<description>shuffle service that needs to be set for Map Reduce to run </description>
</property>

Some default configuration:
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-common/core-default.xml
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/yarn-default.xml
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

Create Symlinks
>cd $HADOOP_COMMON_HOME/share/hadoop/common/lib/
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-app-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-jobclient-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-common-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-shuffle-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-mapreduce-client-core-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-common-*-SNAPSHOT.jar .
>ln -s $HADOOP_MAPRED_HOME/modules/hadoop-yarn-api-*-SNAPSHOT.jar .

Running daemons
Run Resourcemanager and NodeManager
>cd $HADOOP_MAPRED_HOME
>bin/yarn-daemon.sh start resourcemanager
>bin/yarn-daemon.sh start nodemanager

Run the example
>$HADOOP_COMMON_HOME/bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar randomwriter out
>bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar grep input output 'YARN[a-zA-Z.]+'
>cat output/*
1 YARNtestforfun

http://192.168.56.101:8088/cluster
http://192.168.56.101:9999/node

2. Cluster
>cd conf
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-common/core-default.xml
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-common/yarn-default.xml
>wget http://hadoop.apache.org/common/docs/r0.23.0/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml

HADOOP_PREFIX_HOME=/usr/local/hadoop-0.23.0
YARN_HOME=/usr/local/hadoop-0.23.0

>bin/start-all.sh
>$YARN_HOME/bin/yarn start historyserver --config $HADOOP_CONF_DIR
>$HADOOP_PREFIX_HOME/bin/hdfs start namenode --config $HADOOP_CONF_DIR

>$YARN_HOME/bin/yarn historyserver --config $HADOOP_CONF_DIR
http://192.168.56.101:19888/jobhistory

>$YARN_HOME/bin/yarn resourcemanager --config $HADOOP_CONF_DIR
>$YARN_HOME/bin/yarn nodemanager --config $HADOOP_CONF_DIR
>$YARN_HOME/bin/yarn proxyserver --config $HADOOP_CONF_DIR

>sbin/hadoop-daemon.sh namenode --config $HADOOP_CONF_DIR start namenode

It seems ok, but without the web UI interface.

I will prepare some slave machines and try some examples.

references:
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/ClusterSetup.html
http://hadoop.apache.org/common/docs/r0.19.2/cn/cluster_setup.html
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/Federation.html