Ubuntu16.04 IEDA操作本地HDFS

最新推荐文章于 2025-08-19 13:22:35 发布

转载最新推荐文章于 2025-08-19 13:22:35 发布 · 138 阅读

0 ·

CC 4.0 BY-SA版权

原文链接：https://my.oschina.net/xiaopei/blog/1590920

文章标签：

#大数据 #java #运维

本文详细介绍了Hadoop环境的本地搭建过程，包括Java环境配置、SSH免密登录设置及Hadoop软件安装，并通过WordCount实例展示了如何进行单元测试及常见错误排查。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

2019独角兽企业重金招聘Python工程师标准>>>

最近因分享需要本地搭建hadoop环境，用来自我学习和演示，故记录下。

Hadoop环境搭建和启动

就当大家都熟悉了。大概下面几个步骤：

1 ，肯定是Java环境了。下载jdk，解压之后配好JAVA_HOME；并添加到PATH里面。

export JAVA_HOME="/usr/local/java1.8_xxxx"
export PATH=$JAVA_HOME/bin:$PATH

source ~/.bashrs

然后java -version，显示java版本即为ok了。

2，配置ssh localhost免密登录，网上很多例子就不赘述了。笔者碰到一个问题，Ubuntu的登录用户类型为Administrator，ssh localhost登录无权限。调整sshd_config也无效。新添加了个hadoop账户用来配置hadoop。

3，下载Hadoop包，一般都是下载编译好的bin包，直接解压就可以用了。

这个里面需要注意的两点参考其他人去配置core-site.xml，hdf等等各种xml的时候，一定去hadoop官方文档，有个Getting Started，比如http://hadoop.apache.org/docs/r2.8.2/hadoop-project-dist/hadoop-common/SingleCluster.html 。次之是要参考的文章的自己下载hadoop版本一致，否则一旦出错会走弯路浪费时间。

hadoop配置不是很复杂，参考官方的setting up a Single Node Cluster。因为只是测试功能，部署单机模式。如果想配置其他项目，可以参考官方的类似core-default.xml、hdfs-default.xml等配置。

  $ bin/hdfs namenode -format
  $ sbin/start-dfs.sh
  $ bin/hdfs dfs -mkdir /user
  $ bin/hdfs dfs -mkdir /user/<username>
  $ bin/hdfs dfs -put etc/hadoop input
  $ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.8.2.jar grep input output 'dfs[a-z.]+'

正常的可以按顺序执行上面的shell，这样就算是hadoop安装完成了。

中间有可能的错误有Datanode不启动，简单搜索下应该很好解决。在执行命令不成功时候，可以参考hadoop2.xxx/logs目录下面的错误。tail -f * 可以更好的了解错误已方便解决。

WordCount测试

1，IDEA调试WordCount，创建maven项目，不需要选择模板。

2，pox.xml文件配置如下：

maven的Apache的仓库。

添加Apache hadoop-common的依赖。jar包一定从安装hadoop目录share里面来加载。即使是使用同样版本的maven版本也会出莫名其妙的问题，pox.xml里面只添加hadoop-common，把etc/hadoop/core-site.xml配置拷贝到resource目录。在包引用的过程中，曾经使用maven的包出现了如下错误：

thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4。后改为使用hadoop的share目录包正常了

<repositories>
        <repository>
            <id>apache</id>
            <url>http://maven.apache.org</url>
        </repository>
    </repositories>

    <dependencies>
        <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-common -->
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-common</artifactId>
            <version>2.8.2</version>
        </dependency>

        <!-- https://mvnrepository.com/artifact/commons-logging/commons-logging -->
        <dependency>
            <groupId>commons-logging</groupId>
            <artifactId>commons-logging</artifactId>
            <version>1.1.2</version>
        </dependency>

        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>3.8.1</version>
        </dependency>
    </dependencies>

    <build>
        <finalName>5_mybatis_maven</finalName>
        <plugins>
            <plugin>
                <artifactId>maven-compiler-plugin</artifactId>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                </configuration>
            </plugin>
        </plugins>
    </build>

3，Wordcount的代码如下

mapreduce的代码已经和旧的实现不一样了。现在是继承MapReduceBase，实现Mapper。

package com.demo.hadoop;


import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.util.*;

public class WordCount {

    static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);
        private Text word = new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                output.collect(word, one);
            }
        }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {

        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");

        Path outpath = new Path(args[1]);
        outpath.getFileSystem(conf).delete(outpath);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }
}

执行报错exception in thread "main" ExitCodeException exitCode=1: chmod: cannot access '/home/hadoop/tmp/mapred/staging/work334649498/.staging/job_local334649498_0001': No such file or director。

记得拷贝到resource目录的core-site.xml文件，配置hadoop.tmp.dir的value。试一下改到/tmp目录，就可以了。

此处不用xml配置文件，可以使用Configuration类来设置对应的配置。

Configuration conf = new Configuration();
conf.set("hadoop.tmp.dir", "/tmp");
conf.set("fs.defaultFS", "hdfs://127.0.0.1:9000");

//获取FileSystem
fileSystem = FileSystem.get(conf);

4，core-site.xml配置如下：

<configuration>
    <property>
        <name>hadoop.tmp.dir</name>
        <value>/tmp</value>
    </property>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
</configuration>

Run/Debug Configuration参数配置：

hdfs://localhost:9000/user/hadoop/input
hdfs://localhost:9000/user/hadoop/output

执行之后执行hadoop fs -ls /user/hadoop/ouput查看生成的文件。

5，代码打为jar包之后，在命令行执行。

出现了如下错误：

Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Permission denied: user=work, access=EXECUTE, inode="/tmp":hadoop:supergroup:drwx------

网上说是hadoop对应的目录权限问题，用hadoop fs -chmod 777改下目录权限即可。

后想了想，在IDEA里面配置的有site-core的原因？不知道为啥使用IEDA执行是没有权限问题错误的，有点疑惑。

HDFS API创建目录测试

hdfs只是简单的创建目录的实例，其他更多可以参考对应文档。

public class HdfsTest {
    public static void main(String[] args) {
        Configuration conf = new Configuration();
        conf.set("hadoop.tmp.dir", "/tmp");
        conf.set("fs.defaultFS", "hdfs://127.0.0.1:9000");
        FileSystem fileSystem = null;
        try {
            fileSystem = FileSystem.get(conf);
            boolean r = fileSystem.mkdirs(new Path("/user/hadoop/test01"));
            System.out.println(r);
        } catch (Exception e) {
            e.printStackTrace();
        } finally {
            try {
                fileSystem.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
}

Run成功之后，发现hadoop目录test01被创建即为成功。

转载于:https://my.oschina.net/xiaopei/blog/1590920