Hadoop2.4伪分布式环境搭建

最新推荐文章于 2024-08-06 22:59:32 发布

YouCompleteMe

最新推荐文章于 2024-08-06 22:59:32 发布

阅读量884

点赞数

CC 4.0 BY-SA版权

分类专栏： Hadoop

本文链接：https://blog.youkuaiyun.com/skctvc15/article/details/39086543

Hadoop 专栏收录该内容

2 篇文章

订阅专栏

本文介绍Hadoop的安装配置过程，并通过WordCount示例演示如何在Hadoop上进行简单的数据处理任务。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

JDK 与 Hadoop下载安装不再赘述。网上很多

start-all.sh时若出现JAVA_HOME not set的错误：先检查 /etc/profile JAVA环境变量是否正确设置

而后在 hadoop目录 libexec/hadoop-config.sh文件中定位到此处

# Attempt to set JAVA_HOME if it is not set
if [[ -z $JAVA_HOME ]]; then
  # On OSX use java_home (or /Library for older versions)
  if [ "Darwin" == "$(uname -s)" ]; then
    if [ -x /usr/libexec/java_home ]; then
      export JAVA_HOME=($(/usr/libexec/java_home))
    else
      export JAVA_HOME=(/Library/Java/Home)
    fi
  fi

直接在下方添加
export JAVA_HOME=/opt/jdk7

说下配置配置 $HADOOP_HOME/etc目录下的三个xml文件：

core-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>


<!-- Put site-specific property overrides in this file. -->

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/opt/hadoop-2.4.1/tmp</value>
        <description>A base for other temporary directories.</description>
    </property>

    <property>
        <name>fs.default.name</name>
        <value>hdfs://sk-Vostro-3400:9000</value>
    </property>

    <property>
        <name>dfs.name.dir</name>
        <value>/opt/hadoop-2.4.1/name</value>
    </property>

</configuration>

hdfs-site.xml

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>  
    <name>dfs.data.dir</name>  
    <value>/opt/hadoop-2.4.1/hdfs/data</value>  
  </property>  
  <property>  
    <name>dfs.namenode.secondary</name>  
    <value>sk-Vostro-3400:9001</value>  
  </property>  

  <property>  
    <name>dfs.replication</name>  
    <value>1</value>  
  </property>
</configuration>

mapred-site.xml.template

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>
  <property>  
    <name>mapred.job.tracker</name>  
    <value>sk-Vostro-3400:9001</value>  
  </property>  

  <!--property>  
   <name>mapred.local.dir</name>  
   <value>/usr/local/hadoop/mapred/local</value>  
  </property>  

  <property>  
   <name>mapred.system.dir</name>  
   <value>/tmp/hadoop/mapred/system</value>  
  </property-->  
</configuration>

注：2.x版本以后没有master文件，可以在hdfs-site.xml文件里设置secondary namenode。

hadoop.tmp.dir路径必须存在，否则报错。

第一次启动hadoop需要格式化NameNode

hadoop namenode -format

在$HADOOP_HOME/sbin目录下执行 start-all.sh

./start-all.sh

使用jps命令查看结果：

使用hadoop fs -ls /查看HDFS系统

运行无误即安装成功。

下面可以运行hadoop-example.jar里的wordcount示例程序

先在HDFS里创建input文件夹

hadoop fs -mkdir -p /user/hadoop/input
再将创建好得测试文件上传到HDFS系统的input文件夹下面
hadoop fs -put yourFilePath /user/hadoop/input/

最后运行程序
hadoop jar /opt/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hadoop/input/file1.txt /user/hadoop/output



运行结果：
sk@sk-Vostro-3400:/opt/hadoop-2.4.1$ hadoop jar /opt/hadoop-2.4.1/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.4.1.jar wordcount /user/hadoop/input/file1.txt /user/hadoop/output
14/09/05 22:33:35 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id
14/09/05 22:33:35 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=
14/09/05 22:33:37 INFO input.FileInputFormat: Total input paths to process : 1
14/09/05 22:33:37 INFO mapreduce.JobSubmitter: number of splits:1
14/09/05 22:33:37 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local347020784_0001
14/09/05 22:33:37 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/staging/sk347020784/.staging/job_local347020784_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/09/05 22:33:37 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/staging/sk347020784/.staging/job_local347020784_0001/job.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/09/05 22:33:38 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/local/localRunner/sk/job_local347020784_0001/job_local347020784_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.retry.interval;  Ignoring.
14/09/05 22:33:38 WARN conf.Configuration: file:/opt/hadoop-2.4.1/tmp/mapred/local/localRunner/sk/job_local347020784_0001/job_local347020784_0001.xml:an attempt to override final parameter: mapreduce.job.end-notification.max.attempts;  Ignoring.
14/09/05 22:33:38 INFO mapreduce.Job: The url to track the job: http://localhost:8080/
14/09/05 22:33:38 INFO mapreduce.Job: Running job: job_local347020784_0001
14/09/05 22:33:38 INFO mapred.LocalJobRunner: OutputCommitter set in config null
14/09/05 22:33:38 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter
14/09/05 22:33:38 INFO mapred.LocalJobRunner: Waiting for map tasks
14/09/05 22:33:38 INFO mapred.LocalJobRunner: Starting task: attempt_local347020784_0001_m_000000_0
14/09/05 22:33:38 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/09/05 22:33:38 INFO mapred.MapTask: Processing split: hdfs://sk-Vostro-3400:9000/user/hadoop/input/file1.txt:0+62
14/09/05 22:33:39 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
14/09/05 22:33:39 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)
14/09/05 22:33:39 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 100
14/09/05 22:33:39 INFO mapred.MapTask: soft limit at 83886080
14/09/05 22:33:39 INFO mapred.MapTask: bufstart = 0; bufvoid = 104857600
14/09/05 22:33:39 INFO mapred.MapTask: kvstart = 26214396; length = 6553600
14/09/05 22:33:39 INFO mapreduce.Job: Job job_local347020784_0001 running in uber mode : false
14/09/05 22:33:39 INFO mapreduce.Job:  map 0% reduce 0%
14/09/05 22:33:39 INFO mapred.LocalJobRunner: 
14/09/05 22:33:39 INFO mapred.MapTask: Starting flush of map output
14/09/05 22:33:39 INFO mapred.MapTask: Spilling map output
14/09/05 22:33:39 INFO mapred.MapTask: bufstart = 0; bufend = 114; bufvoid = 104857600
14/09/05 22:33:39 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214348(104857392); length = 49/6553600
14/09/05 22:33:39 INFO mapred.MapTask: Finished spill 0
14/09/05 22:33:39 INFO mapred.Task: Task:attempt_local347020784_0001_m_000000_0 is done. And is in the process of committing
14/09/05 22:33:39 INFO mapred.LocalJobRunner: map
14/09/05 22:33:39 INFO mapred.Task: Task 'attempt_local347020784_0001_m_000000_0' done.
14/09/05 22:33:39 INFO mapred.LocalJobRunner: Finishing task: attempt_local347020784_0001_m_000000_0
14/09/05 22:33:39 INFO mapred.LocalJobRunner: map task executor complete.
14/09/05 22:33:39 INFO mapred.LocalJobRunner: Waiting for reduce tasks
14/09/05 22:33:39 INFO mapred.LocalJobRunner: Starting task: attempt_local347020784_0001_r_000000_0
14/09/05 22:33:39 INFO mapred.Task:  Using ResourceCalculatorProcessTree : [ ]
14/09/05 22:33:39 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@a9d8e9
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=334154944, maxSingleShuffleLimit=83538736, mergeThreshold=220542272, ioSortFactor=10, memToMemMergeOutputsThreshold=10
14/09/05 22:33:39 INFO reduce.EventFetcher: attempt_local347020784_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events
14/09/05 22:33:39 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local347020784_0001_m_000000_0 decomp: 77 len: 81 to MEMORY
14/09/05 22:33:39 INFO reduce.InMemoryMapOutput: Read 77 bytes from map-output for attempt_local347020784_0001_m_000000_0
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 77, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->77
14/09/05 22:33:39 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning
14/09/05 22:33:39 INFO mapred.LocalJobRunner: 1 / 1 copied.
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs
14/09/05 22:33:39 INFO mapred.Merger: Merging 1 sorted segments
14/09/05 22:33:39 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 71 bytes
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: Merged 1 segments, 77 bytes to disk to satisfy reduce memory limit
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: Merging 1 files, 81 bytes from disk
14/09/05 22:33:39 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce
14/09/05 22:33:39 INFO mapred.Merger: Merging 1 sorted segments
14/09/05 22:33:39 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 71 bytes
14/09/05 22:33:39 INFO mapred.LocalJobRunner: 1 / 1 copied.
14/09/05 22:33:39 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords
14/09/05 22:33:40 INFO mapred.Task: Task:attempt_local347020784_0001_r_000000_0 is done. And is in the process of committing
14/09/05 22:33:40 INFO mapred.LocalJobRunner: 1 / 1 copied.
14/09/05 22:33:40 INFO mapred.Task: Task attempt_local347020784_0001_r_000000_0 is allowed to commit now
14/09/05 22:33:40 INFO output.FileOutputCommitter: Saved output of task 'attempt_local347020784_0001_r_000000_0' to hdfs://sk-Vostro-3400:9000/user/hadoop/output/_temporary/0/task_local347020784_0001_r_000000
14/09/05 22:33:40 INFO mapred.LocalJobRunner: reduce > reduce
14/09/05 22:33:40 INFO mapred.Task: Task 'attempt_local347020784_0001_r_000000_0' done.
14/09/05 22:33:40 INFO mapred.LocalJobRunner: Finishing task: attempt_local347020784_0001_r_000000_0
14/09/05 22:33:40 INFO mapred.LocalJobRunner: reduce task executor complete.
14/09/05 22:33:40 INFO mapreduce.Job:  map 100% reduce 100%
14/09/05 22:33:40 INFO mapreduce.Job: Job job_local347020784_0001 completed successfully
14/09/05 22:33:40 INFO mapreduce.Job: Counters: 38
    File System Counters
        FILE: Number of bytes read=541080
        FILE: Number of bytes written=987423
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=124
        HDFS: Number of bytes written=47
        HDFS: Number of read operations=13
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=4
    Map-Reduce Framework
        Map input records=4
        Map output records=13
        Map output bytes=114
        Map output materialized bytes=81
        Input split bytes=119
        Combine input records=13
        Combine output records=7
        Reduce input groups=7
        Reduce shuffle bytes=81
        Reduce input records=7
        Reduce output records=7
        Spilled Records=14
        Shuffled Maps =1
        Failed Shuffles=0
        Merged Map outputs=1
        GC time elapsed (ms)=0
        CPU time spent (ms)=0
        Physical memory (bytes) snapshot=0
        Virtual memory (bytes) snapshot=0
        Total committed heap usage (bytes)=424673280
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters 
        Bytes Read=62
    File Output Format Counters 
        Bytes Written=47