Hadoop源码阅读-HDFS-day1

本文详细解读了HDFS的声明及其构造函数,包括初始化过程、参数配置和异常处理,帮助开发者理解如何正确地创建和使用HDFS文件系统。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

HDFS声明及构造函数

 1 @InterfaceAudience.Private
 2 @InterfaceStability.Evolving
 3 public class Hdfs extends AbstractFileSystem {
 4 
 5   DFSClient dfs;
 6   final CryptoCodec factory;
 7   private boolean verifyChecksum = true;
 8 
 9   static {
10     HdfsConfiguration.init();
11   }
12 
13   /**
14    * This constructor has the signature needed by
15    * {@link AbstractFileSystem#createFileSystem(URI, Configuration)}
16    * 
17    * @param theUri which must be that of Hdfs
18    * @param conf configuration
19    * @throws IOException
20    */
21   Hdfs(final URI theUri, final Configuration conf) throws IOException, URISyntaxException {
22     super(theUri, HdfsConstants.HDFS_URI_SCHEME, true, NameNode.DEFAULT_PORT);
23 
24     if (!theUri.getScheme().equalsIgnoreCase(HdfsConstants.HDFS_URI_SCHEME)) {
25       throw new IllegalArgumentException("Passed URI's scheme is not for Hdfs");
26     }
27     String host = theUri.getHost();
28     if (host == null) {
29       throw new IOException("Incomplete HDFS URI, no host: " + theUri);
30     }
31 
32     this.dfs = new DFSClient(theUri, conf, getStatistics());
33     this.factory = CryptoCodec.getInstance(conf);
34   }

Hdfs继承了AbstractFileSystem这个抽象类,其中有一个静态块,执行HdfsConfiguration的初始化方法,我们先来看下这个方法

/**
   * This method is here so that when invoked, HdfsConfiguration is class-loaded if
   * it hasn't already been previously loaded.  Upon loading the class, the static 
   * initializer block above will be executed to add the deprecated keys and to add
   * the default resources.   It is safe for this method to be called multiple times 
   * as the static initializer block will only get invoked once.
   * 
   * This replaces the previously, dangerous practice of other classes calling
   * Configuration.addDefaultResource("hdfs-default.xml") directly without loading 
   * HdfsConfiguration class first, thereby skipping the key deprecation
   */
  public static void init() {
  }

init是一个空函数,它存在的作用,只是为了类加载,当类加载的时候,静态块中的方法将会执行从而增加过时的Key和Resource。因为它本身是一个空函数,它被反复调用时安全的,因为静态块中的方法只会被调用一次,使用这个方法来替代之前不安全的直接类调用Configuration.addDefaultResource("hdfs-default.xml")方法。所以,他的静态块中肯定会包含这个方法,我们来看下是不是这样.

static {
    addDeprecatedKeys();

    // adds the default resources
    Configuration.addDefaultResource("hdfs-default.xml");
    Configuration.addDefaultResource("hdfs-site.xml");

  }

 

看完这些,回到HDFS的父类AbstractFileSystem

/**
 * This class provides an interface for implementors of a Hadoop file system
 * (analogous to the VFS of Unix). Applications do not access this class;
 * instead they access files across all file systems using {@link FileContext}.
 * 
 * Pathnames passed to AbstractFileSystem can be fully qualified URI that
 * matches the "this" file system (ie same scheme and authority) 
 * or a Slash-relative name that is assumed to be relative
 * to the root of the "this" file system .
 */
@InterfaceAudience.Public
@InterfaceStability.Evolving /*Evolving for a release,to be changed to Stable */
public abstract class AbstractFileSystem {

AbstractFileSystem提供了一个实现Hadoop 文件系统的接口,应用访问文件通过FileContext而不需要访问这个类,路径名称一旦匹配了这个文件系统,就会视为合法的URI,否则的会认为是该文件系统根目录的相对路径?(这个不太确定)

来看下它的构造函数:

/**
   * Constructor to be called by subclasses.
   * 
   * @param uri for this file system.
   * @param supportedScheme the scheme supported by the implementor
   * @param authorityNeeded if true then theURI must have authority, if false
   *          then the URI must have null authority.
   *
   * @throws URISyntaxException <code>uri</code> has syntax error
   */
  public AbstractFileSystem(final URI uri, final String supportedScheme,
      final boolean authorityNeeded, final int defaultPort)
      throws URISyntaxException {
    myUri = getUri(uri, supportedScheme, authorityNeeded, defaultPort);
    statistics = getStatistics(uri); 
  }
/**
   * Get the URI for the file system based on the given URI. The path, query
   * part of the given URI is stripped out and default file system port is used
   * to form the URI.
   * 
   * @param uri FileSystem URI.
   * @param authorityNeeded if true authority cannot be null in the URI. If
   *          false authority must be null.
   * @param defaultPort default port to use if port is not specified in the URI.
   * 
   * @return URI of the file system
   * 
   * @throws URISyntaxException <code>uri</code> has syntax error
   */
  private URI getUri(URI uri, String supportedScheme,
      boolean authorityNeeded, int defaultPort) throws URISyntaxException {
    checkScheme(uri, supportedScheme);
    // A file system implementation that requires authority must always
    // specify default port
    if (defaultPort < 0 && authorityNeeded) {
      throw new HadoopIllegalArgumentException(
          "FileSystem implementation error -  default port " + defaultPort
              + " is not valid");
    }
    String authority = uri.getAuthority();
    if (authority == null) {
       if (authorityNeeded) {
         throw new HadoopIllegalArgumentException("Uri without authority: " + uri);
       } else {
         return new URI(supportedScheme + ":///");
       }   
    }
    // authority is non null  - AuthorityNeeded may be true or false.
    int port = uri.getPort();
    port = (port == -1 ? defaultPort : port);
    if (port == -1) { // no port supplied and default port is not specified
      return new URI(supportedScheme, authority, "/", null);
    }
    return new URI(supportedScheme + "://" + uri.getHost() + ":" + port);
  }

getUri主要是通过给定的URI生成FS的URI

转载于:https://www.cnblogs.com/nashiyue/p/5327302.html

"C:\Program Files\Java\jdk-17\bin\java.exe" -Didea.launcher.port=54222 "-Didea.launcher.bin.path=D:\hadoop\IntelliJ IDEA Community Edition 2018.3.6\bin" -Dfile.encoding=UTF-8 -classpath "D:\hadoop\Hadoop\target\classes;D:\hadoop\hadoop-3.1.4\share\hadoop\client\hadoop-client-api-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\client\hadoop-client-runtime-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\client\hadoop-client-minicluster-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\common\hadoop-kms-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\common\hadoop-nfs-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\common\hadoop-common-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\common\hadoop-common-3.1.4-tests.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-nfs-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-rbf-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-3.1.4-tests.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-client-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-httpfs-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-rbf-3.1.4-tests.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-client-3.1.4-tests.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-native-client-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\hdfs\hadoop-hdfs-native-client-3.1.4-tests.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-client-hs-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-client-app-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-client-core-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-client-common-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-client-shuffle-3.1.4.jar;D:\hadoop\hadoop-3.1.4\share\hadoop\mapreduce\hadoop-mapreduce-c
05-07
<think>好的,我现在需要帮助用户解决Hadoop 3.1.4的启动命令配置问题,特别是关于JDK 17和IDEA Community Edition 2018.3.6的classpath路径分析。首先,我得回顾一下用户提供的引用内容和之前的问题。 用户提到了几个关键点:Hadoop运行配置问题、启动命令的构成,以及classpath路径分析。引用中提到了ClassNotFoundException,JAVA_HOME未设置的问题,还有修改hadoop-env.sh和yarn.application.classpath的配置。此外,用户还提到了完全分布式环境的搭建步骤,比如移动和解压安装包,配置workers文件和环境变量。 首先,ClassNotFoundException通常和类路径有关,可能是驱动类没找到,或者打包的jar文件没有正确包含所需的类。用户使用JDK 17,需要注意Hadoop版本是否兼容,因为Hadoop 3.1.4可能对JDK 17的支持有限,官方文档可能推荐的是JDK 8或11。这时候可能需要确认兼容性,或者调整JDK版本。 接下来是启动命令的配置。Hadoop启动通常通过start-dfs.sh和start-yarn.sh,这些脚本会读取Hadoop的配置文件,比如hadoop-env.sh,里面需要正确设置JAVA_HOME。用户提到使用JDK 17,所以必须确保hadoop-env.sh中的JAVA_HOME指向正确的JDK路径,例如export JAVA_HOME=/usr/lib/jvm/jdk-17。如果路径错误,会导致JAVA_HOME找不到,从而引发各种错误。 关于classpath,用户需要运行hadoop classpath命令获取路径,并将结果添加到yarn.application.classpath的配置中。这确保YARN能够找到所有的依赖库。同时,如果用户使用IDEA进行开发,需要确保项目中的classpath配置正确,包括Hadoop的库和JDK 17的版本。IDEA 2018.3.6可能较旧,可能需要检查是否支持JDK 17,或者是否需要升级插件。 另外,用户的环境变量配置在/etc/profile.d/my_env.sh中,需要确认Hadoop和JDK的路径是否正确设置,并且通过source命令使其生效。同时,workers文件中的节点名称是否正确,比如master、slave1、slave2是否与实际的主机名对应,SSH免密登录是否配置,这些都是启动分布式环境的关键步骤。 可能用户遇到的错误包括:ClassNotFoundException(类路径问题)、JAVA_HOME未设置(环境变量或配置文件错误)、或者节点通信问题(workers配置或SSH问题)。需要一步步检查这些方面,确认每个配置项的正确性。 最后,用户可能需要了解如何在IDEA中配置Hadoop项目,添加正确的依赖库,并设置运行参数,确保打包后的jar包含所有需要的类,并且Hadoop的classpath正确指向这些依赖。同时,考虑到JDK 17的新特性,比如模块化系统,可能需要调整Hadoop的启动脚本或依赖项,避免不兼容的问题。</think>### Hadoop 3.1.4 启动命令配置详解(JDK-17 + IDEA 2018.3.6) #### 一、核心配置项分析 1. **JAVA_HOME配置** 修改 `$HADOOP_HOME/etc/hadoop/hadoop-env.sh` 文件,明确指定JDK-17路径: ```bash export JAVA_HOME=/usr/lib/jvm/jdk-17 ``` 若未配置,会触发 `Error: JAVA_HOME is not set` 错误[^2]。 2. **Classpath动态获取** 运行 `hadoop classpath` 获取当前Hadoop类路径,输出结果需添加到 `yarn-site.xml` 的 `yarn.application.classpath` 配置项中: ```xml <property> <name>yarn.application.classpath</name> <value>/opt/module/hadoop-3.1.4/etc/hadoop:/opt/module/hadoop-3.1.4/share/hadoop/common/lib/*:...</value> </property> ``` 该路径包含Hadoop核心库和第三方依赖[^3]。 3. **IDEA项目配置要点** - 确保 `pom.xml` 中Hadoop依赖版本与集群版本一致(3.1.4) - 设置编译兼容性为JDK-17: ```xml <properties> <maven.compiler.source>17</maven.compiler.source> <maven.compiler.target>17</maven.compiler.target> </properties> ``` #### 二、典型启动命令结构 完整作业提交命令示例: ```bash hadoop jar your-job.jar com.example.MainClass \ -D mapreduce.job.queuename=default \ /input/path /output/path ``` - `hadoop jar`:Hadoop执行入口 - `-D` 参数:动态配置MapReduce参数 - 末尾参数:程序自定义输入输出路径 #### 三、JDK-17兼容性问题 需注意: 1. Hadoop 3.1.4 官方支持最高JDK11,使用JDK-17需重新编译Hadoop源码: ```bash mvn clean package -Pdist,native -DskipTests -Dmaven.javadoc.skip=true -Djava.version=17 ``` 2. 若出现 `UnsupportedClassVersionError`,需检查IDEA的编译输出目标版本设置 #### 四、调试技巧 1. **环境验证命令**: ```bash # 检查JAVA_HOME echo $JAVA_HOME # 验证Hadoop类路径 hadoop classpath ``` 2. **日志定位**: - NameNode日志:`$HADOOP_HOME/logs/hadoop-*-namenode-*.log` - YARN日志:通过 `yarn logs -applicationId <app_id>` 获取
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值