Hadoop single node安装

单节点Hadoop集群搭建
本文详细介绍如何在Ubuntu Linux上搭建单节点伪分布式Hadoop集群,并配置Hadoop Distributed File System (HDFS)。教程覆盖了从Java安装到Hadoop配置的所有步骤。
还是老外写的靠谱,不用动脑子,直接照着做就OK

下面是内容,请允许我粘贴过来

n this tutorial I will describe the required steps for setting upa pseudo-distributed,single-node Hadoop cluster backed by theHadoop Distributed File System, running on Ubuntu Linux.

Are you looking for the  multi-nodecluster tutorial? Just  headover there.

Hadoop is a framework written in Java for running applications onlarge clusters of commodity hardware and incorporates featuressimilar to those of the Google FileSystem (GFS) and ofthe MapReducecomputingparadigm. Hadoop’s HDFS isa highly fault-tolerant distributed file system and, like Hadoop ingeneral, designed to be deployed on low-cost hardware. It provideshigh throughput access to application data and is suitable forapplications that have large data sets.

The main goal of this tutorial is to get a simple Hadoopinstallation up and running so that you can play around with thesoftware and learn more about it.

This tutorial has been tested with the following softwareversions:

  • UbuntuLinux 10.04 LTS (deprecated: 8.10 LTS, 8.04,7.10, 7.04)
  • Hadoop 1.0.3,released May 2012

Hadoop <wbr>single <wbr>node安装

Figure 1: Cluster of machines running Hadoop at Yahoo! (Source:Yahoo!)

Prerequisites

Sun Java 6

Hadoop requires a working Java 1.5+ (aka Java 5) installation.However, using Java 1.6 (akaJava 6) is recommended for running Hadoop. Forthe sake of this tutorial, I will therefore describe theinstallation of Java 1.6.

Important Note: The apt instructions below are takenfrom  thisSuperUser.com thread. I got notified that the previousinstructions that I provided no longer work. Please be aware thatadding a third-party repository to your Ubuntu configuration isconsidered a security risk. If you do not want to proceed with theapt instructions below, feel free to install Sun JDK 6 viaalternative means (e.g. by  downloadingthe binary package from Oracle) and then continue with the nextsection in the tutorial.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# Add the Ferramosca Roberto's repository to your apt repositories
# See https://launchpad.net/~ferramroberto/
#
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:ferramroberto/java

# Update the source list
$ sudo apt-get update

# Install Sun Java 6 JDK
$ sudo apt-get install sun-java6-jdk

# Select Sun's Java as the default on your machine.
# See 'sudo update-alternatives --config java' for more information.
#
$ sudo update-java-alternatives -s java-6-sun

The full JDK which will be placed in /usr/lib/jvm/java-6-sun (well,this directory is actually a symlink on Ubuntu).

After installation, make a quick check whether Sun’s JDK iscorrectly set up:

1
2
3
4
user@ubuntu:~# java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) Client VM (build 16.3-b01, mixed mode, sharing)

Adding a dedicated Hadoop system user

We will use a dedicated Hadoop user account for running Hadoop.While that’s not required it is recommended because it helps toseparate the Hadoop installation from other software applicationsand user accounts running on the same machine (think: security,permissions, backups, etc).

1
2
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser

This will add the user hduser andthe group hadoop toyour local machine.

Configuring SSH

Hadoop requires SSH access to manage its nodes, i.e. remotemachines plus your local machine if you want to use Hadoop on it(which is what we want to do in this short tutorial). For oursingle-node setup of Hadoop, we therefore need to configure SSHaccess to localhost forthe hduseruserwe created in the previous section.

I assume that you have SSH up and running on your machine andconfigured it to allow SSH public key authentication. If not, thereare severalonline guides available.

First, we have to generate an SSH key forthe hduser user.

1
2
3
4
5
6
7
8
9
10
11
12
user@ubuntu:~$ su - hduser
hduser@ubuntu:~$ ssh-keygen -t rsa -P ""
Generating public/private rsa key pair.
Enter file in which to save the key (/home/hduser/.ssh/id_rsa):
Created directory '/home/hduser/.ssh'.
Your identification has been saved in /home/hduser/.ssh/id_rsa.
Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.
The key fingerprint is:
9b:82:ea:58:b4:e0:35:d7:ff:19:66:a6:ef:ae:0e:d2 hduser@ubuntu
The key's randomart image is:
[...snipp...]
hduser@ubuntu:~$

The second line will create an RSA key pair with an empty password.Generally, using an empty password is not recommended, but in thiscase it is needed to unlock the key without your interaction (youdon’t want to enter the passphrase every time Hadoop interacts withits nodes).

Second, you have to enable SSH access to your local machine withthis newly created key.

1
hduser@ubuntu:~$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys

The final step is to test the SSH setup by connecting to your localmachine with the hduser user.The step is also needed to save your local machine’s host keyfingerprint to the hduser user’s known_hosts file.If you have any special SSH configuration for your local machinelike a non-standard SSH port, you can define host-specific SSHoptions in $HOME/.ssh/config (see manssh_config for more information).

1
2
3
4
5
6
7
8
9
hduser@ubuntu:~$ ssh localhost
The authenticity of host 'localhost (::1)' can't be established.
RSA key fingerprint is d7:87:25:47:ae:02:00:eb:1d:75:4f:bb:44:f9:36:26.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'localhost' (RSA) to the list of known hosts.
Linux ubuntu 2.6.32-22-generic #33-Ubuntu SMP Wed Apr 28 13:27:30 UTC 2010 i686 GNU/Linux
Ubuntu 10.04 LTS
[...snipp...]
hduser@ubuntu:~$

If the SSH connect should fail, these general tips might help:

  • Enable debugging with ssh-vvv localhost and investigate the error indetail.
  • Check the SSH server configuration in /etc/ssh/sshd_config,in particular the options PubkeyAuthentication (whichshould be set to yes)and AllowUsers (ifthis option is active, add the hduser userto it). If you made any changes to the SSH server configurationfile, you can force a configuration reloadwith sudo/etc/init.d/ssh reload.

Disabling IPv6

One problem with IPv6 on Ubuntu is thatusing 0.0.0.0 forthe various networking-related Hadoop configuration options willresult in Hadoop binding to the IPv6 addresses of my Ubuntu box. Inmy case, I realized that there’s no practical point in enablingIPv6 on a box when you are not connected to any IPv6 network.Hence, I simply disabled IPv6 on my Ubuntu machine. Your mileagemay vary.

To disable IPv6 on Ubuntu 10.04 LTS,open /etc/sysctl.conf inthe editor of your choice and add the following lines to the end ofthe file:

/etc/sysctl.conf
1
2
3
4
# disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

You have to reboot your machine in order to make the changes takeeffect.

You can check whether IPv6 is enabled on your machine with thefollowing command:

1
$ cat /proc/sys/net/ipv6/conf/all/disable_ipv6

A return value of 0 means IPv6 is enabled, a value of 1 meansdisabled (that’s what we want).

Alternative

You can also disable IPv6 only for Hadoop as documentedin HADOOP-3437.You can do so by adding the following lineto conf/hadoop-env.sh:

conf/hadoop-env.sh
1
export HADOOP_OPTS=-Djava.net.preferIPv4Stack=true

Hadoop

Installation

DownloadHadoop from the Apache DownloadMirrors and extract the contents of the Hadooppackage to a location of your choice. Ipicked /usr/local/hadoop.Make sure to change the owner of all the files tothe hduser userand hadoop group,for example:

1
2
3
4
$ cd /usr/local
$ sudo tar xzf hadoop-1.0.3.tar.gz
$ sudo mv hadoop-1.0.3 hadoop
$ sudo chown -R hduser:hadoop hadoop

(Just to give you the idea, YMMV – personally, I create a symlinkfrom hadoop-1.0.3 to hadoop.)

Update $HOME/.bashrc

Add the following lines to the end ofthe $HOME/.bashrc fileof user hduser.If you use a shell other than bash, you should of course update itsappropriate configuration files insteadof .bashrc.

$HOME/.bashrc
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop

# Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop later on)
export JAVA_HOME=/usr/lib/jvm/java-6-sun

# Some convenient aliases and functions for running Hadoop-related commands
unalias fs &> /dev/null
alias fs="hadoop fs"
unalias hls &> /dev/null
alias hls="fs -ls"

# If you have LZO compression enabled in your Hadoop cluster and
# compress job outputs with LZOP (not covered in this tutorial):
# Conveniently inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
# Requires installed 'lzop' command.
#
lzohead () {
    hadoop fs -cat $1 | lzop -dc | head -1000 | less
}

# Add Hadoop bin/ directory to PATH
export PATH=$PATH:$HADOOP_HOME/bin

You can repeat this exercise also for other users who want to useHadoop.

Excursus: Hadoop Distributed File System (HDFS)

Before we continue let us briefly learn a bit more about Hadoop’sdistributed file system.

The Hadoop Distributed File System (HDFS) is a distributed filesystem designed to run on commodity hardware. It has manysimilarities with existing distributed file systems. However, thedifferences from other distributed file systems are significant.HDFS is highly fault-tolerant and is designed to be deployed onlow-cost hardware. HDFS provides high throughput access toapplication data and is suitable for applications that have largedata sets. HDFS relaxes a few POSIX requirements to enablestreaming access to file system data. HDFS was originally built asinfrastructure for the Apache Nutch web search engine project. HDFSis part of the Apache Hadoop project, which is part of the ApacheLucene project.

TheHadoop Distributed File System: Architecture andDesign hadoop.apache.org/hdfs/docs/…

The following picture gives an overview of the most important HDFScomponents.

Hadoop <wbr>single <wbr>node安装

Configuration

Our goal in this tutorial is a single-node setup of Hadoop. Moreinformation of what we do in this section is available onthe HadoopWiki.

hadoop-env.sh

The only required environment variable we have to configure forHadoop in this tutorial is JAVA_HOME.Open conf/hadoop-env.sh inthe editor of your choice (if you used the installation path inthis tutorial, the full path is /usr/local/hadoop/conf/hadoop-env.sh)and set the JAVA_HOME environmentvariable to the Sun JDK/JRE 6 directory.

Change

conf/hadoop-env.sh
1
2
# The java implementation to use.  Required.
# export JAVA_HOME=/usr/lib/j2sdk1.5-sun

to

conf/hadoop-env.sh
1
2
# The java implementation to use.  Required.
export JAVA_HOME=/usr/lib/jvm/java-6-sun

Note: If you are on a Mac with OS X 10.7 you can use the followingline to set up JAVA_HOME in conf/hadoop-env.sh.

conf/hadoop-env.sh (on Mac systems)
1
2
# for our Mac users
export JAVA_HOME=`/usr/libexec/java_home`

conf/*-site.xml

In this section, we will configure the directory where Hadoop willstore its data files, the network ports it listens to, etc. Oursetup will use Hadoop’s Distributed FileSystem, HDFS,even though our little “cluster” only contains our single localmachine.

You can leave the settings below “as is” with the exception ofthe hadoop.tmp.dir parameter– this parameter you must change to a directory of your choice. Wewill use the directory /app/hadoop/tmp inthis tutorial. Hadoop’s default configurationsuse hadoop.tmp.dir asthe base temporary directory both for the local file system andHDFS, so don’t be surprised if you see Hadoop creating thespecified directory automatically on HDFS at some later point.

Now we create the directory and set the required ownerships andpermissions:

1
2
3
4
$ sudo mkdir -p /app/hadoop/tmp
$ sudo chown hduser:hadoop /app/hadoop/tmp
# ...and if you want to tighten up security, chmod from 755 to 750...
$ sudo chmod 750 /app/hadoop/tmp

If you forget to set the required ownerships and permissions, youwill see a java.io.IOExceptionwhenyou try to format the name node in the next section).

Add the following snippets between the ...  tags in the respective configurationXML file.

In file conf/core-site.xml:

conf/core-site.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

  hadoop.tmp.dir
  /app/hadoop/tmp
  A base for other temporary directories.



  fs.default.name
  hdfs://localhost:54310
  The name of the default file system.  A URI whose
  scheme and authority determine the FileSystem implementation.  The
  uri's scheme determines the config property (fs.SCHEME.impl) naming
  the FileSystem implementation class.  The uri's authority is used to
  determine the host, port, etc. for a filesystem.


In file conf/mapred-site.xml:

conf/mapred-site.xml
1
2
3
4
5
6
7
8

  mapred.job.tracker
  localhost:54311
  The host and port that the MapReduce job tracker runs
  at.  If "local", then jobs are run in-process as a single map
  and reduce task.
  


In file conf/hdfs-site.xml:

conf/hdfs-site.xml
1
2
3
4
5
6
7
8

  dfs.replication
  1
  Default block replication.
  The actual number of replications can be specified when the file is created.
  The default is used if replication is not specified in create time.
  


See GettingStarted with Hadoop and the documentationin Hadoop’sAPI Overview if you have any questions aboutHadoop’s configuration options.


AI 代码审查Review工具 是一个旨在自动化代码审查流程的工具。它通过集成版本控制系统(如 GitHub 和 GitLab)的 Webhook,利用大型语言模型(LLM)对代码变更进行分析,并将审查意见反馈到相应的 Pull Request 或 Merge Request 中。此外,它还支持将审查结果通知到企业微信等通讯工具。 一个基于 LLM 的自动化代码审查助手。通过 GitHub/GitLab Webhook 监听 PR/MR 变更,调用 AI 分析代码,并将审查意见自动评论到 PR/MR,同时支持多种通知渠道。 主要功能 多平台支持: 集成 GitHub 和 GitLab Webhook,监听 Pull Request / Merge Request 事件。 智能审查模式: 详细审查 (/github_webhook, /gitlab_webhook): AI 对每个变更文件进行分析,旨在找出具体问题。审查意见会以结构化的形式(例如,定位到特定代码行、问题分类、严重程度、分析和建议)逐条评论到 PR/MR。AI 模型会输出 JSON 格式的分析结果,系统再将其转换为多条独立的评论。 通用审查 (/github_webhook_general, /gitlab_webhook_general): AI 对每个变更文件进行整体性分析,并为每个文件生成一个 Markdown 格式的总结性评论。 自动化流程: 自动将 AI 审查意见(详细模式下为多条,通用模式下为每个文件一条)发布到 PR/MR。 在所有文件审查完毕后,自动在 PR/MR 中发布一条总结性评论。 即便 AI 未发现任何值得报告的问题,也会发布相应的友好提示和总结评论。 异步处理审查任务,快速响应 Webhook。 通过 Redis 防止对同一 Commit 的重复审查。 灵活配置: 通过环境变量设置基
【直流微电网】径向直流微电网的状态空间建模与线性化:一种耦合DC-DC变换器状态空间平均模型的方法 (Matlab代码实现)内容概要:本文介绍了径向直流微电网的状态空间建模与线性化方法,重点提出了一种基于耦合DC-DC变换器的状态空间平均模型的建模策略。该方法通过数学建模手段对直流微电网系统进行精确的状态空间描述,并对其进行线性化处理,以便于系统稳定性分析与控制器设计。文中结合Matlab代码实现,展示了建模与仿真过程,有助于研究人员理解和复现相关技术,推动直流微电网系统的动态性能研究与工程应用。; 适合人群:具备电力电子、电力系统或自动化等相关背景,熟悉Matlab/Simulink仿真工具,从事新能源、微电网或智能电网研究的研究生、科研人员及工程技术人员。; 使用场景及目标:①掌握直流微电网的动态建模方法;②学习DC-DC变换器在耦合条件下的状态空间平均建模技巧;③实现系统的线性化分析并支持后续控制器设计(如电压稳定控制、功率分配等);④为科研论文撰写、项目仿真验证提供技术支持与代码参考。; 阅读建议:建议读者结合Matlab代码逐步实践建模流程,重点关注状态变量选取、平均化处理和线性化推导过程,同时可扩展应用于更复杂的直流微电网拓扑结构中,提升系统分析与设计能力。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值