JDK 与 Hadoop下载安装不再赘述。网上很多
start-all.sh时若出现JAVA_HOME not set的错误:先检查 /etc/profile JAVA环境变量是否正确设置
而后在 hadoop目录 libexec/hadoop-config.sh文件中定位到此处
# Attempt to set JAVA_HOME if it is not set
if [[ -z $JAVA_HOME ]]; then
# On OSX use java_home (or /Library for older versions)
if [ "Darwin" == "$(uname -s)" ]; then
if [ -x /usr/libexec/java_home ]; then
export JAVA_HOME=($(/usr/libexec/java_home))
else
export JAVA_HOME=(/Library/Java/Home)
fi
fi
直接在下方添加
export JAVA_HOME=/opt/jdk7
说下配置配置 $HADOOP_HOME/etc目录下的三个xml文件:
core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/hadoop-2.4.1/tmp</value>
<description>A base for other temporary directories.</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://sk-Vostro-3400:9000</value>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop-2.4.1/name</value>
</property>
</configuration>
hdfs-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop-2.4.1/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary</name>
<value>sk-Vostro-3400:9001</value>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
mapred-site.xml.template
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>sk-Vostro-3400:9001</value>
</property>
<!--property>
<name>mapred.local.dir</name>
<value>/usr/local/hadoop/mapred/local</value>
</property>
<property>
<name>mapred.system.dir</name>
<value>/tmp/hadoop/mapred/system</value>
</property-->
</configuration>
注:2.x版本以后没有master文件,可以在hdfs-site.xml文件里设置secondary namenode。
hadoop.tmp.dir路径必须存在,否则报错。
第一次启动hadoop需要格式化NameNode
hadoop namenode -format
在$HADOOP_HOME/sbin目录下执行 start-all.sh
./start-all.sh
使用jps命令查看结果:
使用hadoop fs -ls /查看HDFS系统
运行无误即安装成功。
下面可以运行hadoop-example.jar里的wordcount示例程序
先在HDFS里创建input文件夹
hadoop fs -mkdir -p /user/hadoop/input
再将创建好得测试文件上传到HDFS系统的input文件夹下面
hadoop fs -put yourFilePath /user/hadoop/input/
最后运行程序
hadoop jar /opt/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hadoop/input/file1.txt /user/hadoop/output
运行结果:
sk@sk-Vostro-3400:/opt/hadoop-2.4.1$ hadoop jar /opt/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hadoop/input/file1.txt /user/hadoop/output
14/09/05 22:33:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/09/05 22:33:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/09/05 22:33:37 INFO input.FileInputFormat: Total input paths to process : 1
14/09/05 22:33:37 INFO mapreduce.JobSubmitter: number of splits:1
14/09/05 22:33:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local347020784_0001
14/09/05 22:33:37 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/staging/sk347020784/.staging/job_local347020784_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/09/05 22:33:37 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/staging/sk347020784/.staging/job_local347020784_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/09/05 22:33:38 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/local/localRunner/sk/job_local347020784_0001/job_local347020784_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval; Ignoring.
14/09/05 22:33:38 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/local/localRunner/sk/job_local347020784_0001/job_local347020784_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts; Ignoring.
14/09/05 22:33:38 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/09/05 22:33:38 INFO mapreduce.Job: Running job: job_local347020784_0001
14/09/05 22:33:38 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/09/05 22:33:38 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/09/05 22:33:38 INFO mapred.LocalJobRunner: Waiting for map tasks
14/09/05 22:33:38 INFO mapred.LocalJobRunner: Starting task: attempt_local347020784_0001_m_000000_0
14/09/05 22:33:38 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/09/05 22:33:38 INFO mapred.MapTask: Processing split: hdfs://sk-Vostro-3400:9000/user/hadoop/input/file1.txt:0+62
14/09/05 22:33:39 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/09/05 22:33:39 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/09/05 22:33:39 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/09/05 22:33:39 INFO mapred.MapTask: soft limit at 83886080
14/09/05 22:33:39 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/09/05 22:33:39 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/09/05 22:33:39 INFO mapreduce.Job: Job job_local347020784_0001 running in uber mode : false
14/09/05 22:33:39 INFO mapreduce.Job: map 0% reduce 0%
14/09/05 22:33:39 INFO mapred.LocalJobRunner:
14/09/05 22:33:39 INFO mapred.MapTask: Starting flush of map output
14/09/05 22:33:39 INFO mapred.MapTask: Spilling map output
14/09/05 22:33:39 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600
14/09/05 22:33:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214348(104857392); length = 49/6553600
14/09/05 22:33:39 INFO mapred.MapTask: Finished spill 0
14/09/05 22:33:39 INFO mapred.Task: Task:attempt_local347020784_0001_m_000000_0 is done. And is in the process of committing
14/09/05 22:33:39 INFO mapred.LocalJobRunner: map
14/09/05 22:33:39 INFO mapred.Task: Task 'attempt_local347020784_0001_m_000000_0' done.
14/09/05 22:33:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local347020784_0001_m_000000_0
14/09/05 22:33:39 INFO mapred.LocalJobRunner: map task executor complete.
14/09/05 22:33:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
14/09/05 22:33:39 INFO mapred.LocalJobRunner: Starting task: attempt_local347020784_0001_r_000000_0
14/09/05 22:33:39 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]
14/09/05 22:33:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@a9d8e9
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334154944, maxSingleShuffleLimit=83538736, mergeThreshold=220542272, ioSortFactor=10, memToMemMergeOutputsThreshold=10
14/09/05 22:33:39 INFO reduce.EventFetcher: attempt_local347020784_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
14/09/05 22:33:39 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local347020784_0001_m_000000_0 decomp: 77 len: 81 to MEMORY
14/09/05 22:33:39 INFO reduce.InMemoryMapOutput: Read 77 bytes from map-output for attempt_local347020784_0001_m_000000_0
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 77, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->77
14/09/05 22:33:39 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
14/09/05 22:33:39 INFO mapred.LocalJobRunner: 1 / 1 copied.
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
14/09/05 22:33:39 INFO mapred.Merger: Merging 1 sorted segments
14/09/05 22:33:39 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 71 bytes
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: Merged 1 segments, 77 bytes to disk to satisfy reduce memory limit
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: Merging 1 files, 81 bytes from disk
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
14/09/05 22:33:39 INFO mapred.Merger: Merging 1 sorted segments
14/09/05 22:33:39 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 71 bytes
14/09/05 22:33:39 INFO mapred.LocalJobRunner: 1 / 1 copied.
14/09/05 22:33:39 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
14/09/05 22:33:40 INFO mapred.Task: Task:attempt_local347020784_0001_r_000000_0 is done. And is in the process of committing
14/09/05 22:33:40 INFO mapred.LocalJobRunner: 1 / 1 copied.
14/09/05 22:33:40 INFO mapred.Task: Task attempt_local347020784_0001_r_000000_0 is allowed to commit now
14/09/05 22:33:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local347020784_0001_r_000000_0' to hdfs://sk-Vostro-3400:9000/user/hadoop/output/_temporary/0/task_local347020784_0001_r_000000
14/09/05 22:33:40 INFO mapred.LocalJobRunner: reduce > reduce
14/09/05 22:33:40 INFO mapred.Task: Task 'attempt_local347020784_0001_r_000000_0' done.
14/09/05 22:33:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local347020784_0001_r_000000_0
14/09/05 22:33:40 INFO mapred.LocalJobRunner: reduce task executor complete.
14/09/05 22:33:40 INFO mapreduce.Job: map 100% reduce 100%
14/09/05 22:33:40 INFO mapreduce.Job: Job job_local347020784_0001 completed successfully
14/09/05 22:33:40 INFO mapreduce.Job: Counters: 38
File System Counters
FILE: Number of bytes read=541080
FILE: Number of bytes written=987423
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=124
HDFS: Number of bytes written=47
HDFS: Number of read operations=13
HDFS: Number of large read operations=0
HDFS: Number of write operations=4
Map-Reduce Framework
Map input records=4
Map output records=13
Map output bytes=114
Map output materialized bytes=81
Input split bytes=119
Combine input records=13
Combine output records=7
Reduce input groups=7
Reduce shuffle bytes=81
Reduce input records=7
Reduce output records=7
Spilled Records=14
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=0
CPU time spent (ms)=0
Physical memory (bytes) snapshot=0
Virtual memory (bytes) snapshot=0
Total committed heap usage (bytes)=424673280
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=62
File Output Format Counters
Bytes Written=47