HADOOP单节点安装

最新推荐文章于 2024-10-20 19:18:26 发布

原创最新推荐文章于 2024-10-20 19:18:26 发布 · 1.7k 阅读

·

0

·

CC 4.0 BY-SA版权

版权声明：本文为博主原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

文章标签：

#hadoop #output #input #mapreduce #java

c/c++ 同时被 2 个专栏收录

83 篇文章

订阅专栏

82 篇文章

订阅专栏

Single Node Setup

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

目的：这个文档描述如何安装和配置一个电节点的HADOOP安装，这样你就可以快速执行简单操作，使用HADOOP的MapReduce和HADOOP分布式文件系统。

Prerequisites：预先准备

Supported Platforms

Required Software

Required software for Linux and Windows include:

Additional requirements for Windows include:

Cygwin - Required for shell support in addition to the required software above. windows平台需要安装Cygwin。

Installing Software

If your cluster doesn't have the requisite software you will need to install it.如果你的节点没有必须的软件，你需要安装他们

For example on Ubuntu Linux:以Ubuntu linux为例

$ sudo apt-get install ssh
$ sudo apt-get install rsync

On Windows, if you did not install the required software when you installed cygwin, start the cygwin installer and select the packages:

openssh - the Net category windows就不说了。

Download

To get a Hadoop distribution, download a recent stable release from one of the Apache Download Mirrors.

获取HADOOP集群软件，在Apache HADOOP网站下载一个当前稳定版本。

Prepare to Start the Hadoop Cluster

Unpack the downloaded Hadoop distribution. In the distribution, edit the file conf/hadoop-env.sh to define at leastJAVA_HOME to be the root of your Java installation.

解压下载的HADOOP集群软件，打开文件目录，编辑文件conf/hadoop-env.sh，修改JAVA_HOME路径，设置为JAVA JDK的根路径。

Try the following command:
$ bin/hadoop
This will display the usage documentation for the hadoop script.

尝试执行下面的命令，

这样讲显示HADOOP脚本的有用信息。

Now you are ready to start your Hadoop cluster in one of the three supported modes:

Standalone Operation：独立操作

By default, Hadoop is configured to run in a non-distributed mode, as a single Java process. This is useful for debugging.

默认方式，HADOOP被配置为非分布模式，作为一个单独的JAVA程式，这种模式对调试很有用。

The following example copies the unpacked conf directory to use as input and then finds and displays every match of the given regular expression. Output is written to the given output directory.

下面的例子拷贝一个被解压的conf文件夹作为input，查找和显示给定的正则式的匹配。output被用于输出的文件夹。
$ mkdir input
$ cp conf/*.xml input
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'
$ cat output/*

Pseudo-Distributed Operation：伪分布式操作

Hadoop can also be run on a single-node in a pseudo-distributed mode where each Hadoop daemon runs in a separate Java process.

HADOOP能运行于单节点于伪分布模式，每个HADOOP的坚守进程运行于分开的JAVA进程。

Configuration：配置

Use the following:

conf/core-site.xml:

<configuration>
     <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
     </property>
</configuration>

conf/hdfs-site.xml:

<configuration>
     <property>
         <name>dfs.replication</name>
         <value>1</value>
     </property>
</configuration>

conf/mapred-site.xml:

<configuration>
     <property>
         <name>mapred.job.tracker</name>
         <value>localhost:9001</value>
     </property>
</configuration>

Setup passphraseless ssh

Now check that you can ssh to the localhost without a passphrase:

现在检查你是否可以在没有密码的时候SSH到本地主机
$ ssh localhost

If you cannot ssh to localhost without a passphrase, execute the following commands:

如果你不能SSH连接到本地主机，在没有密码的情况下，则执行下面的命令；
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Execution：执行

Format a new distributed-filesystem:

格式化一个新的分布式文件系统
$ bin/hadoop namenode -format

Start the hadoop daemons:

启动HADOOP的守护进程
$ bin/start-all.sh

The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).

HADOOP的守护进程日志输出被写到HADOOP_LOG_DIR目录，默认是HADOOP目录下的log目录。

Browse the web interface for the NameNode and the JobTracker; by default they are available at:

the NameNode and the JobTracker浏览器的接口，默认他们是

Copy the input files into the distributed filesystem:

拷贝input中的文件到分布式系统
$ bin/hadoop fs -put conf input

Run some of the examples provided:

运行一些提供的实例
$ bin/hadoop jar hadoop-examples-*.jar grep input output 'dfs[a-z.]+'

Examine the output files:

检查文件输出

Copy the output files from the distributed filesystem to the local filesytem and examine them:

拷贝output文件夹中的文件从分布式文件系统到本地文件系统，检查他们：
$ bin/hadoop fs -get output output
$ cat output/*

View the output files on the distributed filesystem:

查看分布式系统中的output文件夹中的文件
$ bin/hadoop fs -cat output/*

When you're done, stop the daemons with:

你可以使用下面的命令结束坚守进程：
$ bin/stop-all.sh

Fully-Distributed Operation

For information on setting up fully-distributed, non-trivial clusters see Cluster Setup.

Java and JNI are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries.

评论

被折叠的条评论为什么被折叠?

到【灌水乐园】发言

查看更多评论

添加红包

成就一亿技术人!

hope_wisdom

发出的红包

实付元

使用余额支付

点击重新获取

扫码支付

钱包余额 0

抵扣说明：

1.余额是钱包充值的虚拟货币，按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载，可以购买VIP、付费专栏及课程。