Hadoop single node安装 (续)

Formatting the HDFS filesystem via the NameNode

The first step to starting up your Hadoop installation isformatting the Hadoop filesystem which is implemented on top of thelocal filesystem of your “cluster” (which includes only your localmachine if you followed this tutorial). You need to do this thefirst time you set up a Hadoop cluster.

Do not format a running Hadoop filesystem as you will lose all thedata currently in the cluster (in HDFS)!

To format the filesystem (which simply initializes the directoryspecified by the dfs.name.dirvariable),run the command

1
hduser@ubuntu:~$ /usr/local/hadoop/bin/hadoop namenode -format

The output will look like this:

12345678910111213141516171819
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop namenode -format 10/05/08 16:59:56 INFO namenode.NameNode: STARTUP_MSG: 10/05/08 16:59:56 INFO namenode.FSNamesystem: fsOwner=hduser,hadoop 10/05/08 16:59:56 INFO namenode.FSNamesystem: supergroup=supergroup 10/05/08 16:59:56 INFO namenode.FSNamesystem: isPermissionEnabled=true 10/05/08 16:59:56 INFO common.Storage: Image file of size 96 saved in 0 seconds. 10/05/08 16:59:57 INFO common.Storage: Storage directory .../hadoop-hduser/dfs/name has been successfully formatted. 10/05/08 16:59:57 INFO namenode.NameNode: SHUTDOWN_MSG: hduser@ubuntu:/usr/local/hadoop$

Starting your single-node cluster

Run the command:

1
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

This will startup a Namenode, Datanode, Jobtracker and aTasktracker on your machine.

The output will look like this:

1234567
hduser@ubuntu:/usr/local/hadoop$ bin/start-all.sh starting namenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-namenode-ubuntu.out localhost: starting datanode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-datanode-ubuntu.out localhost: starting secondarynamenode, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-secondarynamenode-ubuntu.outstarting jobtracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-jobtracker-ubuntu.out localhost: starting tasktracker, logging to /usr/local/hadoop/bin/../logs/hadoop-hduser-tasktracker-ubuntu.outhduser@ubuntu:/usr/local/hadoop$

A nifty tool for checking whether the expected Hadoop processes arerunning is jps (partof Sun’s Java since v1.5.0). See also How todebug MapReduce programs.

1234567
hduser@ubuntu:/usr/local/hadoop$ jps 2287 TaskTracker 2149 JobTracker 1938 DataNode 2085 SecondaryNameNode 2349 Jps 1788 NameNode

You can also check with netstat ifHadoop is listening on the configured ports.

123456789101112
hduser@ubuntu:~$ sudo netstat -plten | grep java tcp 0 0 0.0.0.0:50070 0.0.0.0:* LISTEN 1001 9236 2471/java tcp 0 0 0.0.0.0:50010 0.0.0.0:* LISTEN 1001 9998 2628/java tcp 0 0 0.0.0.0:48159 0.0.0.0:* LISTEN 1001 8496 2628/javatcp 0 0 0.0.0.0:53121 0.0.0.0:* LISTEN 1001 9228 2857/java tcp 0 0 127.0.0.1:54310 0.0.0.0:* LISTEN 1001 8143 2471/java tcp 0 0 127.0.0.1:54311 0.0.0.0:* LISTEN 1001 9230 2857/java tcp 0 0 0.0.0.0:59305 0.0.0.0:* LISTEN 1001 8141 2471/java tcp 0 0 0.0.0.0:50060 0.0.0.0:* LISTEN 1001 9857 3005/javatcp 0 0 0.0.0.0:49900 0.0.0.0:* LISTEN 1001 9037 2785/java tcp 0 0 0.0.0.0:50030 0.0.0.0:* LISTEN 1001 9773 2857/java hduser@ubuntu:~$

If there are any errors, examine the log files inthe /logs/ directory.

Stopping your single-node cluster

Run the command

1
hduser@ubuntu:~$ /usr/local/hadoop/bin/stop-all.sh

to stop all the daemons running on your machine.

Example output:

1234567
hduser@ubuntu:/usr/local/hadoop$ bin/stop-all.sh stopping jobtracker localhost: stopping tasktracker stopping namenode localhost: stopping datanode localhost: stopping secondarynamenode hduser@ubuntu:/usr/local/hadoop$

Running a MapReduce job

We will now run your first Hadoop MapReduce job. We will usethe WordCount examplejob which reads text files and counts howoften words occur. The input is text files and the output is textfiles, each line of which contains a word and the count of howoften it occurred, separated by a tab. More informationof what happens behind thescenes is available atthe Hadoop Wiki.

Download example input data

We will use three ebooks from Project Gutenberg for thisexample:

Download each ebook as text files in PlainText UTF-8 encoding and store the files ina local temporary directory of choice, forexample /tmp/gutenberg.

123456
hduser@ubuntu:~$ ls -l /tmp/gutenberg/ total 3604 -rw-r--r-- 1 hduser hadoop 674566 Feb 3 10:17 pg20417.txt -rw-r--r-- 1 hduser hadoop 1573112 Feb 3 10:18 pg4300.txt -rw-r--r-- 1 hduser hadoop 1423801 Feb 3 10:18 pg5000.txthduser@ubuntu:~$

Restart the Hadoop cluster

Restart your Hadoop cluster if it’s not running already.

1
hduser@ubuntu:~$ /usr/local/hadoop/bin/start-all.sh

Copy local example data to HDFS

Before we run the actual MapReduce job, wefirst have tocopy the files from our local file system toHadoop’s HDFS.

12345678910
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -copyFromLocal /tmp/gutenberg /user/hduser/gutenberg hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser Found 1 items drwxr-xr-x - hduser supergroup 0 2010-05-08 17:40 /user/hduser/gutenberg hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg Found 3 items -rw-r--r-- 3 hduser supergroup 674566 2011-03-10 11:38 /user/hduser/gutenberg/pg20417.txt -rw-r--r-- 3 hduser supergroup 1573112 2011-03-10 11:38 /user/hduser/gutenberg/pg4300.txt -rw-r--r-- 3 hduser supergroup 1423801 2011-03-10 11:38 /user/hduser/gutenberg/pg5000.txt hduser@ubuntu:/usr/local/hadoop$

Run the MapReduce job

Now, we actually run the WordCount example job.

1
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output

This command will read all the files in the HDFSdirectory /user/hduser/gutenberg,process it, and store the result in the HDFSdirectory /user/hduser/gutenberg-output.

Note: Some people run the command above and get the following errormessage:
Exception in thread "main" java.io.IOException: Error opening job jar: hadoop*examples*.jar at org.apache.hadoop.util.RunJar.main (RunJar.java: 90) Caused by: java.util.zip.ZipException: error in opening zip file
In this case, re-run the command with the full name of the HadoopExamples JAR file, for example:
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop-examples-1.0.3.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output

Example output of the previous command in the console:

1234567891011121314151617181920212223242526272829
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount /user/hduser/gutenberg /user/hduser/gutenberg-output 10/05/08 17:43:00 INFO input.FileInputFormat: Total input paths to process : 3 10/05/08 17:43:01 INFO mapred.JobClient: Running job: job_201005081732_0001 10/05/08 17:43:02 INFO mapred.JobClient: map 0% reduce 0% 10/05/08 17:43:14 INFO mapred.JobClient: map 66% reduce 0% 10/05/08 17:43:17 INFO mapred.JobClient: map 100% reduce 0%10/05/08 17:43:26 INFO mapred.JobClient: map 100% reduce 100% 10/05/08 17:43:28 INFO mapred.JobClient: Job complete: job_201005081732_0001 10/05/08 17:43:28 INFO mapred.JobClient: Counters: 17 10/05/08 17:43:28 INFO mapred.JobClient: Job Counters 10/05/08 17:43:28 INFO mapred.JobClient: Launched reduce tasks=1 10/05/08 17:43:28 INFO mapred.JobClient: Launched map tasks=3 10/05/08 17:43:28 INFO mapred.JobClient: Data-local map tasks=310/05/08 17:43:28 INFO mapred.JobClient: FileSystemCounters 10/05/08 17:43:28 INFO mapred.JobClient: FILE_BYTES_READ=2214026 10/05/08 17:43:28 INFO mapred.JobClient: HDFS_BYTES_READ=3639512 10/05/08 17:43:28 INFO mapred.JobClient: FILE_BYTES_WRITTEN=3687918 10/05/08 17:43:28 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=880330 10/05/08 17:43:28 INFO mapred.JobClient: Map-Reduce Framework 10/05/08 17:43:28 INFO mapred.JobClient: Reduce input groups=82290 10/05/08 17:43:28 INFO mapred.JobClient: Combine output records=102286 10/05/08 17:43:28 INFO mapred.JobClient: Map input records=77934 10/05/08 17:43:28 INFO mapred.JobClient: Reduce shuffle bytes=1473796 10/05/08 17:43:28 INFO mapred.JobClient: Reduce output records=82290 10/05/08 17:43:28 INFO mapred.JobClient: Spilled Records=255874 10/05/08 17:43:28 INFO mapred.JobClient: Map output bytes=6076267 10/05/08 17:43:28 INFO mapred.JobClient: Combine input records=629187 10/05/08 17:43:28 INFO mapred.JobClient: Map output records=629187 10/05/08 17:43:28 INFO mapred.JobClient: Reduce input records=102286

Check if the result is successfully stored in HDFSdirectory /user/hduser/gutenberg-output:

123456789
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser Found 2 itemsdrwxr-xr-x - hduser supergroup 0 2010-05-08 17:40 /user/hduser/gutenberg drwxr-xr-x - hduser supergroup 0 2010-05-08 17:43 /user/hduser/gutenberg-outputhduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -ls /user/hduser/gutenberg-output Found 2 items drwxr-xr-x - hduser supergroup 0 2010-05-08 17:43 /user/hduser/gutenberg-output/_logs -rw-r--r-- 1 hduser supergroup 880802 2010-05-08 17:43 /user/hduser/gutenberg-output/part-r-00000hduser@ubuntu:/usr/local/hadoop$

If you want to modify some Hadoop settings on the fly likeincreasing the number of Reduce tasks, you can usethe "-D" option:

1
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop jar hadoop*examples*.jar wordcount -D mapred.reduce.tasks=16 /user/hduser/gutenberg /user/hduser/gutenberg-output
An important note about  mapred.map.tasks Hadoopdoes not honor mapred.map.tasks  beyondconsidering it a hint. But it accepts the userspecified  mapred.reduce.tasks  anddoesn’t manipulate that. You cannotforce  mapred.map.tasks  butyou can specify  mapred.reduce.tasks.

Retrieve the job result from HDFS

To inspect the file, you can copy it from HDFS to the local filesystem. Alternatively, you can use the command

1
hduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -cat /user/hduser/gutenberg-output/part-r-00000

to read the file directly from HDFS without copying it to the localfile system. In this tutorial, we will copy the results to thelocal file system though.

1234567891011121314
hduser@ubuntu:/usr/local/hadoop$ mkdir /tmp/gutenberg-outputhduser@ubuntu:/usr/local/hadoop$ bin/hadoop dfs -getmerge /user/hduser/gutenberg-output /tmp/gutenberg-outputhduser@ubuntu:/usr/local/hadoop$ head /tmp/gutenberg-output/gutenberg-output "(Lo)cra" 1 "1490 1 "1498," 1 "35" 1 "40," 1 "A 2 "AS-IS". 1 "A_ 1 "Absoluti 1"Alack! 1 hduser@ubuntu:/usr/local/hadoop$

Note that in this specific output the quote signs (“) enclosing thewords in the head outputabove have not been inserted by Hadoop. They are the result of theword tokenizer used in the WordCount example, and in this case theymatched the beginning of a quote in the ebook texts. Just inspectthe part-00000 filefurther to see it for yourself.

The command fs-getmerge will simply concatenate any filesit finds in the directory you specify. This means that the mergedfile might (and most likely will) notbe sorted.

Hadoop Web Interfaces

Hadoop comes with several web interfaces which are by default(see conf/hadoop-default.xml)available at these locations:

These web interfaces provide concise information about what’shappening in your Hadoop cluster. You might want to give them atry.

NameNode Web Interface (HDFS layer)

The name node web UI shows you a cluster summary includinginformation about total/remaining capacity, live and dead nodes.Additionally, it allows you to browse the HDFS namespace and viewthe contents of its files in the web browser. It also gives accessto the local machine’s Hadoop log files.

By default, it’s available at http://localhost:50070/.

Hadoop <wbr>single <wbr>node安装 <wbr>(续)

JobTracker Web Interface (MapReduce layer)

The JobTracker web UI provides information about general jobstatistics of the Hadoop cluster, running/completed/failed jobs anda job history log file. It also gives access to the ‘‘localmachine’s’’ Hadoop log files (the machine on which the web UI isrunning on).

By default, it’s available at http://localhost:50030/.

Hadoop <wbr>single <wbr>node安装 <wbr>(续)

TaskTracker Web Interface (MapReduce layer)

The task tracker web UI shows you running and non-running tasks. Italso gives access to the ‘‘local machine’s’’ Hadoop log files.

By default, it’s available at http://localhost:50060/.

Hadoop <wbr>single <wbr>node安装 <wbr>(续)

What’s next?

If you’re feeling comfortable, you can continue your Hadoopexperience with my follow-up tutorial RunningHadoop On Ubuntu Linux (Multi-NodeCluster) where I describe how to build aHadoop ‘‘multi-node’’ cluster with two Ubuntu boxes (this willincrease your current cluster size by 100%, heh).

In addition, I wrote atutorial on howto code a simple MapReduce job in the Pythonprogramming language which can serve as the basis for writing yourown MapReduce programs.

Related Links

From yours truly:

From other people:

Change Log

Only important changes to this article are listed here:

  • 2011-07-17: Renamed the Hadoop userfrom hadoop to hduser basedon readers’ feedback. This should make the distinction between thelocal Hadoop user (now hduser),the local Hadoop group (hadoop),and the Hadoop CLI tool (hadoop)more clear.
混合动力汽车(HEV)模型的Simscape模型(Matlab代码、Simulink仿真实现)内容概要:本文档介绍了一个混合动力汽车(HEV)的Simscape模型,该模型通过Matlab代码和Simulink仿真工具实现,旨在对混合动力汽车的动力系统进行建模与仿真分析。模型涵盖了发动机、电机、电池、传动系统等关键部件,能够模拟车辆在不同工况下的能量流动与控制策略,适用于动力系统设计、能耗优化及控制算法验证等研究方向。文档还提及该资源属于一个涵盖多个科研领域的MATLAB仿真资源包,涉及电力系统、机器学习、路径规划、信号处理等多个技术方向,配套提供网盘下载链接,便于用户获取完整资源。; 适合人群:具备Matlab/Simulink使用基础的高校研究生、科研人员及从事新能源汽车系统仿真的工程技术人员。; 使用场景及目标:①开展混合动力汽车能量管理策略的研究与仿真验证;②学习基于Simscape的物理系统建模方法;③作为教学案例用于车辆工程或自动化相关课程的实践环节;④与其他优化算法(如智能优化、强化学习)结合,实现控制策略的优化设计。; 阅读建议:建议使用者先熟悉Matlab/Simulink及Simscape基础操作,结合文档中的模型结构逐步理解各模块功能,可在此基础上修改参数或替换控制算法以满足具体研究需求,同时推荐访问提供的网盘链接获取完整代码与示例文件以便深入学习与调试。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值