HDFS下载数据之源码分析-FileSystem.get(conf)_block01

本文详细解析了从Hadoop分布式文件系统(HDFS)下载数据的过程,包括通过配置文件获取FileSystem实例,以及通过不同构造函数创建DistributedFileSystem实例的具体步骤。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

首先来看一下, FileSystem(org.apache.hadoop.fs.FileSystem), 这是一个抽象类, 是所有文件系统的父类.

而我们要从HDFS(Hadoop Distributed FileSystem)下载数据, 应该获取一个DistributedFileSystem的实例,那么如何获取一个DistributedFileSystem的实例呢?

FileSystem fs = FileSystem.get(new Configuration());

在FileSystem中有3个重载的get()方法

// 1.通过配置文件获取一个FileSystem实例
public static FileSystem get(Configuration conf)
// 2.通过指定的FileSystem的URI, 配置文件获取一个FileSystem实例
public static FileSystem get(URI uri, Configuration conf)
// 3.通过指定的FileSystem的URI, 配置文件, FileSystem用户名获取一个FileSystem实例
public static FileSystem get(final URI uri, final Configuration conf, final String user)

先调用FileSystem.get(Configuration conf)方法,再调用重载方法FileSystem.get(URI uri, Configuration conf)

public static FileSystem get(URI uri, Configuration conf) throws IOException {
    // schem是FileSystem具体的URI方案如: file, hdfs, Webhdfs, har等等
    String scheme = uri.getScheme();    // scheme = hdfs
    // authority是NameNode的主机名, 端口号
    String authority = uri.getAuthority();    // authority = node1:9000
    ...
    // disableCacheName = fs.hdfs.impl.disable.cache
    String disableCacheName = String.format("fs.%s.impl.disable.cache", scheme);    
    // 读取配置文件, 判断是否禁用缓存
    if (conf.getBoolean(disableCacheName, false)) {    // 若禁用缓存
      return createFileSystem(uri, conf);    // 直接调用创建FileSystem实例的方法
    }
    // 不禁用缓存, 先从FileSystem的静态成员变量CACHE中获取FileSystem实例
    return CACHE.get(uri, conf);
}

再调用FileSystem$Cache.get(URI uri, Configuration conf)方法(Cache是FileSystem的静态内部类)

FileSystem get(URI uri, Configuration conf) throws IOException{
      Key key = new Key(uri, conf);    // key = (root (auth:SIMPLE))@hdfs://node1:9000
      return getInternal(uri, conf, key);
}

再调用FileSystem$Cache.getInternal(URI uri, Configuration conf, FileSystem$Cache$Key key)方法(Key又是Cache的静态内部类)

private FileSystem getInternal(URI uri, Configuration conf, Key key) throws IOException{
      FileSystem fs;
      synchronized (this) {
        // map是Cache中用来缓存FileSystem实例的成员变量, 其类型为HashMap<Key, FileSystem>
        fs = map.get(key);
      }
      if (fs != null) {    // 如果从缓存map中获取到了相应的FileSystem实例
        return fs;    // 则返回这个实例
      }
      // 否则, 调用FileSystem.createFileSystem(URI uri, Configuration conf)方法, 创建FileSystem实例
      fs = createFileSystem(uri, conf);
      /* 分割线1, 期待着createFileSystem()方法的返回 */
      synchronized (this) { // refetch the lock again
        /*
         * 在多线程环境下, 可能另一个客户端(另一个线程)创建好了一个DistributedFileSystem实例, 并缓存到了map中
         * 所以, 这时候就把当前客户端新创建的DistributedFileSystem实例注销
         * 其实这是一个特殊的单例模式, 一个key映射一个DistributedFileSystem实例
         */
        FileSystem oldfs = map.get(key);
        if (oldfs != null) { // a file system is created while lock is releasing
          fs.close(); // close the new file system
          return oldfs;  // return the old file system
        }
        /*
         * now insert the new file system into the map
         * 缓存当前新创建的DistributedFileSystem实例到map中
         */
        fs.key = key;
        map.put(key, fs);
        ...
        return fs;
      }
}

来自分割线1, 先调用FileSystem.createFileSystem(URI uri, Configuration conf)方法

private static FileSystem createFileSystem(URI uri, Configuration conf
      ) throws IOException {
    // 通过读取配置文件, 获取FileSystem具体的URI模式: hdfs的类对象
    Class<?> clazz = getFileSystemClass(uri.getScheme(), conf); // clazz = org.apache.hadoop.hdfs.DistributedFileSystem
    ...
    // 反射出一个DistributedFileSystem实例
    FileSystem fs = (FileSystem)ReflectionUtils.newInstance(clazz, conf);
    // 对DistributedFileSystem实例初始化
    fs.initialize(uri, conf);
    return fs;
}

在调用DistributedFileSystem.initialize(URI uri, Configuration conf)方法之前, 先来看一下DistributedFileSystem类吧.

DistributedFileSystem是抽象类FileSystem的子类实现,

public class DistributedFileSystem extends FileSystem {
  ...
  DFSClient dfs;    // DistributedFileSystem持有一个DFSClient类型的成员变量dfs, 最重要的成员变量!
  ...
}

调用DistributedFileSystem.initialize(URI uri, Configuration conf)方法

public void initialize(URI uri, Configuration conf) throws IOException {
    ...
    // new一个DFSClient实例, 成员变量dfs引用这个DFSClient实例
    this.dfs = new DFSClient(uri, conf, statistics );
    /* 分割线2, 期待着new DFSClient()的返回 */
    ...
}

在new DFSClient实例之前, 先来看一下DFSClient类吧! 看一下到底要为哪些成员变量赋值

public class DFSClient implements java.io.Closeable, RemotePeerFactory {
  ...
  final ClientProtocol namenode;    //DFSClient持有一个ClientProtocol类型的成员变量namenode, 一个RPC代理对象
  /* The service used for delegation tokens */
  private Text dtService;
  ...
}

来自分割线2调用DFSClient的构造函数DFSClient(URI nameNodeUri, Configuration conf, FileSystem$Statistics statistics), 再调用重载构造函数DFSClient(URI nameNodeUri, ClientProtocol rpcNamenode, Configuration conf, FileSystem$Statistics statistics)

public DFSClient(URI nameNodeUri, ClientProtocol rpcNamenode, Configuration conf, 
    FileSystem.Statistics stats) throws IOException {
    ...
    NameNodeProxies.ProxyAndInfo<ClientProtocol> proxyInfo = null;
    if (numResponseToDrop > 0) {    // numResponseToDrop = 0
      // This case is used for testing.
      LOG.warn(DFSConfigKeys.DFS_CLIENT_TEST_DROP_NAMENODE_RESPONSE_NUM_KEY
          + " is set to " + numResponseToDrop
          + ", this hacked client will proactively drop responses");
      proxyInfo = NameNodeProxies.createProxyWithLossyRetryHandler(conf,
          nameNodeUri, ClientProtocol.class, numResponseToDrop);
    }
    
    if (proxyInfo != null) { // proxyInfo = null
      this.dtService = proxyInfo.getDelegationTokenService();
      this.namenode = proxyInfo.getProxy();
    } else if (rpcNamenode != null) { // rpcNamenode = null
      // This case is used for testing.
      Preconditions.checkArgument(nameNodeUri == null);
      this.namenode = rpcNamenode;
      dtService = null;
    } else {    // 前面两个if只在测试的情况下成立, 这个else的代码块才是重点
      ...
      /*
       * 创建一个NameNodeProxies.ProxyAndInfo<ClientProtocol>类型的对象, proxyInfo引用这个对象 
       * createProxy(conf, nameNodeUri, ClientProtocol.class)方法是不是和RPC.getProxy(Class<T> protocol,
       * long clientVersion, InetSocketAddress addr, Configuration conf)很像?
       * 没错! 你没看错! 这说明createProxy()方法内部一定会调用RPC的相关方法
       * conf    都是Configuration类型的conf
       * nameNodeUri = hdfs://node1:9000    这不就是InetSocketAddress类型的addr的hostName和port
       * ClientProtocol.class    都是RPC protocol接口的类对象
       * ClientProtocol is used by user code via DistributedFileSystem class to communicate 
       * with the NameNode
       * ClientProtocol是DistributedFileSystem用来与NameNode通信的
       */
      proxyInfo = NameNodeProxies.createProxy(conf, nameNodeUri, ClientProtocol.class);
      /* 分割线3, 期待着createProxy()方法的返回 */
      this.dtService = proxyInfo.getDelegationTokenService();
      this.namenode = proxyInfo.getProxy();
    }
    ...
}

来自分割线3调用NameNodeProxies.createProxy(Configuration conf, URI nameNodeUri, Class<T> xface)方法

/**
   * Creates the namenode proxy with the passed protocol. This will handle
   * creation of either HA- or non-HA-enabled proxy objects, depending upon
   * if the provided URI is a configured logical URI.
   * 通过传过来的protocol参数, 创建namenode的代理对象. 至于是HA还是非HA的namenode代理对象, 
   * 这取决于实际搭建的Hadoop环境
   **/
public static <T> ProxyAndInfo<T> createProxy(Configuration conf, URI nameNodeUri, Class<T> xface)
    throws IOException {
    // 获取Hadoop实际环境中HA的配置
    Class<FailoverProxyProvider<T>> failoverProxyProviderClass =
        getFailoverProxyProviderClass(conf, nameNodeUri, xface);

    if (failoverProxyProviderClass == null) {    // 非HA,这里是Hadoop的伪分布式搭建
      // Non-HA case, 创建一个非HA的namenode代理对象
      return createNonHAProxy(conf, NameNode.getAddress(nameNodeUri), xface,
          UserGroupInformation.getCurrentUser(), true);
    } else {    // HA
      // HA case
      FailoverProxyProvider<T> failoverProxyProvider = NameNodeProxies
          .createFailoverProxyProvider(conf, failoverProxyProviderClass, xface,
              nameNodeUri);
      Conf config = new Conf(conf);
      T proxy = (T) RetryProxy.create(xface, failoverProxyProvider,
          RetryPolicies.failoverOnNetworkException(
              RetryPolicies.TRY_ONCE_THEN_FAIL, config.maxFailoverAttempts,
              config.maxRetryAttempts, config.failoverSleepBaseMillis,
              config.failoverSleepMaxMillis));
      
      Text dtService = HAUtil.buildTokenServiceForLogicalUri(nameNodeUri);
      // 返回一个proxy, dtService的封装对象proxyInfo
      return new ProxyAndInfo<T>(proxy, dtService);
    }
}

调用NameNodeProxies.createNonHAProxy(Configuration conf, InetSocketAddress nnAddr, Class<T> xface, UserGroupInformation ugi, boolean withRetries)方法

public static <T> ProxyAndInfo<T> createNonHAProxy(Configuration conf, InetSocketAddress nnAddr,
    Class<T> xface, UserGroupInformation ugi, boolean withRetries) throws IOException {
    Text dtService = SecurityUtil.buildTokenService(nnAddr);    //dtService = 192.168.8.101:9000
    T proxy;
    if (xface == ClientProtocol.class) {    // xface = ClientProtocol.class
      // 创建一个namenode代理对象
      proxy = (T) createNNProxyWithClientProtocol(nnAddr, conf, ugi, withRetries);
      /* 分割线4, 期待着createNNProxyWithClientProtocol()方法返回 */
    } else if {
      ...
    }
    // 把proxy, dtService封装成一个ProxyAndInfo对象, 并返回
    return new ProxyAndInfo<T>(proxy, dtService);
  }

block02戳我



转载于:https://my.oschina.net/u/2503731/blog/663705

第1章 HDFS 1 1.1 HDFS概述 1 1.1.1 HDFS体系结构 1 1.1.2 HDFS基本概念 2 1.2 HDFS通信协议 4 1.2.1 Hadoop RPC接口 4 1.2.2 流式接口 20 1.3 HDFS主要流程 22 1.3.1 HDFS客户端读流程 22 1.3.2 HDFS客户端写流程 24 1.3.3 HDFS客户端追加写流程 25 1.3.4 Datanode启动、心跳以及执行名字节点指令流程 26 1.3.5 HA切换流程 27 第2章 Hadoop RPC 29 2.1 概述 29 2.1.1 RPC框架概述 29 2.1.2 Hadoop RPC框架概述 30 2.2 Hadoop RPC的使用 36 2.2.1 Hadoop RPC使用概述 36 2.2.2 定义RPC协议 40 2.2.3 客户端获取Proxy对象 45 2.2.4 服务器获取Server对象 54 2.3 Hadoop RPC实现 63 2.3.1 RPC类实现 63 2.3.2 Client类实现 64 2.3.3 Server类实现 76 第3章 Namenode(名字节点) 88 3.1 文件系统树 88 3.1.1 INode相关类 89 3.1.2 Feature相关类 102 3.1.3 FSEditLog类 117 3.1.4 FSImage类 138 3.1.5 FSDirectory类 158 3.2 数据块管理 162 3.2.1 Block、Replica、BlocksMap 162 3.2.2 数据块副本状态 167 3.2.3 BlockManager类(done) 177 3.3 数据节点管理 211 3.3.1 DatanodeDescriptor 212 3.3.2 DatanodeStorageInfo 214 3.3.3 DatanodeManager 217 3.4 租约管理 233 3.4.1 LeaseManager.Lease 233 3.4.2 LeaseManager 234 3.5 缓存管理 246 3.5.1 缓存概念 247 3.5.2 缓存管理命令 247 3.5.3 HDFS集中式缓存架构 247 3.5.4 CacheManager类实现 248 3.5.5 CacheReplicationMonitor 250 3.6 ClientProtocol实现 251 3.6.1 创建文件 251 3.6.2 追加写文件 254 3.6.3 创建新的数据块 257 3.6.4 放弃数据块 265 3.6.5 关闭文件 266 3.7 Namenode的启动和停止 268 3.7.1 安全模式 268 3.7.2 HDFS High Availability 276 3.7.3 名字节点的启动 301 3.7.4 名字节点的停止 306 第4章 Datanode(数据节点) 307 4.1 Datanode逻辑结构 307 4.1.1 HDFS 1.X架构 307 4.1.2 HDFS Federation 308 4.1.3 Datanode逻辑结构 310 4.2 Datanode存储 312 4.2.1 Datanode升级机制 312 4.2.2 Datanode磁盘存储结构 315 4.2.3 DataStorage实现 317 4.3 文件系统数据集 334 4.3.1 Datanode上数据块副本的状态 335 4.3.2 BlockPoolSlice实现 335 4.3.3 FsVolumeImpl实现 342 4.3.4 FsVolumeList实现 345 4.3.5 FsDatasetImpl实现 348 4.4 BlockPoolManager 375 4.4.1 BPServiceActor实现 376 4.4.2 BPOfferService实现 389 4.4.3 BlockPoolManager实现 396 4.5 流式接口 398 4.5.1 DataTransferProtocol定义 398 4.5.2 Sender和Receiver 399 4.5.3 DataXceiverServer 403 4.5.4 DataXceiver 406 4.5.5 读数据 408 4.5.6 写数据(done) 423 4.5.7 数据块替换、数据块拷贝和读数据块校验 437 4.5.8 短路读操作 437 4.6 数据块扫描器 437 4.6.1 DataBlockScanner实现 438 4.6.2 BlockPoolSliceScanner实现 439 4.7 DirectoryScanner 442 4.8 DataNode类的实现 443 4.8.1 DataNode的启动 444 4.8.2 DataNode的关闭 446 第5章 HDFS客户端 447 5.1 DFSClient实现 447 5.1.1 构造方法 448 5.1.2 关闭方法 449 5.1.3 文件系统管理与配置方法 450 5.1.4 HDFS文件与操作方法 451 5.1.5 HDFS文件读写方法 452 5.2 文件读操作与输入流 452 5.2.1 打开文件 452 5.2.2 读操作――DFSInputStream实现 461 5.3 文件短路读操作 481 5.3.1 短路读共享内存 482 5.3.2 DataTransferProtocol 484 5.3.3 DFSClient短路读操作流程 488 5.3.4 Datanode短路读操作流程 509 5.4 文件写操作与输出流 512 5.4.1 创建文件 512 5.4.2 写操作――DFSOutputStream实现 516 5.4.3 追加写操作 543 5.4.4 租约相关 546 5.4.5 关闭输出流 548 5.5 HDFS常用工具 549 5.5.1 FsShell实现 550 5.5.2 DFSAdmin实现 552
D:\software\develop\java\jdk-1.8\jdk-1.8\bin\java.exe "-javaagent:D:\software\develop\ideal\IntelliJ IDEA 2022.1.3\lib\idea_rt.jar=58407:D:\software\develop\ideal\IntelliJ IDEA 2022.1.3\bin" -Dfile.encoding=UTF-8 -classpath D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\charsets.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\deploy.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\access-bridge-64.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\cldrdata.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\dnsns.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\jaccess.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\jfxrt.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\localedata.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\nashorn.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\sunec.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\sunjce_provider.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\sunmscapi.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\sunpkcs11.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\ext\zipfs.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\javaws.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\jce.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\jfr.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\jfxswt.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\jsse.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\management-agent.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\plugin.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\resources.jar;D:\software\develop\java\jdk-1.8\jdk-1.8\jre\lib\rt.jar;D:\workspace\Maven\titanic_demo\target\scala-2.12\classes;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\clearspring\analytics\stream\2.9.6\stream-2.9.6.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\esotericsoftware\kryo-shaded\4.0.2\kryo-shaded-4.0.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\esotericsoftware\minlog\1.3.0\minlog-1.3.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-annotations\2.12.3\jackson-annotations-2.12.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-core\2.12.3\jackson-core-2.12.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\core\jackson-databind\2.12.3\jackson-databind-2.12.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\fasterxml\jackson\module\jackson-module-scala_2.12\2.12.3\jackson-module-scala_2.12-2.12.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\github\luben\zstd-jni\1.5.0-4\zstd-jni-1.5.0-4.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\google\code\findbugs\jsr305\3.0.2\jsr305-3.0.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\google\code\gson\gson\2.8.6\gson-2.8.6.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\google\crypto\tink\tink\1.6.0\tink-1.6.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\google\flatbuffers\flatbuffers-java\1.9.0\flatbuffers-java-1.9.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\google\guava\guava\16.0.1\guava-16.0.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\google\protobuf\protobuf-java\3.14.0\protobuf-java-3.14.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\ning\compress-lzf\1.0.3\compress-lzf-1.0.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\thoughtworks\paranamer\paranamer\2.8\paranamer-2.8.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\twitter\chill-java\0.10.0\chill-java-0.10.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\twitter\chill_2.12\0.10.0\chill_2.12-0.10.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\com\univocity\univocity-parsers\2.9.1\univocity-parsers-2.9.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\commons-codec\commons-codec\1.15\commons-codec-1.15.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\commons-collections\commons-collections\3.2.2\commons-collections-3.2.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\commons-io\commons-io\2.8.0\commons-io-2.8.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\commons-lang\commons-lang\2.6\commons-lang-2.6.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\commons-logging\commons-logging\1.1.3\commons-logging-1.1.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\commons-net\commons-net\3.1\commons-net-3.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\airlift\aircompressor\0.21\aircompressor-0.21.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\dropwizard\metrics\metrics-core\4.2.0\metrics-core-4.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\dropwizard\metrics\metrics-graphite\4.2.0\metrics-graphite-4.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\dropwizard\metrics\metrics-jmx\4.2.0\metrics-jmx-4.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\dropwizard\metrics\metrics-json\4.2.0\metrics-json-4.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\dropwizard\metrics\metrics-jvm\4.2.0\metrics-jvm-4.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-all\4.1.68.Final\netty-all-4.1.68.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-buffer\4.1.50.Final\netty-buffer-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-codec\4.1.50.Final\netty-codec-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-common\4.1.50.Final\netty-common-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-handler\4.1.50.Final\netty-handler-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-resolver\4.1.50.Final\netty-resolver-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-transport-native-epoll\4.1.50.Final\netty-transport-native-epoll-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-transport-native-unix-common\4.1.50.Final\netty-transport-native-unix-common-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\io\netty\netty-transport\4.1.50.Final\netty-transport-4.1.50.Final.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\jakarta\annotation\jakarta.annotation-api\1.3.5\jakarta.annotation-api-1.3.5.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\jakarta\servlet\jakarta.servlet-api\4.0.3\jakarta.servlet-api-4.0.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\jakarta\validation\jakarta.validation-api\2.0.2\jakarta.validation-api-2.0.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\jakarta\ws\rs\jakarta.ws.rs-api\2.1.6\jakarta.ws.rs-api-2.1.6.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\javax\activation\activation\1.1.1\activation-1.1.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\javax\annotation\javax.annotation-api\1.3.2\javax.annotation-api-1.3.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\javax\xml\bind\jaxb-api\2.2.11\jaxb-api-2.2.11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\log4j\log4j\1.2.17\log4j-1.2.17.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\net\razorvine\pyrolite\4.30\pyrolite-4.30.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\net\sf\py4j\py4j\0.10.9.2\py4j-0.10.9.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\antlr\antlr4-runtime\4.8\antlr4-runtime-4.8.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\arrow\arrow-format\2.0.0\arrow-format-2.0.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\arrow\arrow-memory-core\2.0.0\arrow-memory-core-2.0.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\arrow\arrow-memory-netty\2.0.0\arrow-memory-netty-2.0.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\arrow\arrow-vector\2.0.0\arrow-vector-2.0.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\avro\avro-ipc\1.10.2\avro-ipc-1.10.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\avro\avro-mapred\1.10.2\avro-mapred-1.10.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\avro\avro\1.10.2\avro-1.10.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\commons\commons-compress\1.20\commons-compress-1.20.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\commons\commons-crypto\1.1.0\commons-crypto-1.1.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\commons\commons-lang3\3.12.0\commons-lang3-3.12.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\commons\commons-math3\3.4.1\commons-math3-3.4.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\commons\commons-text\1.6\commons-text-1.6.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\curator\curator-client\2.13.0\curator-client-2.13.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\curator\curator-framework\2.13.0\curator-framework-2.13.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\curator\curator-recipes\2.13.0\curator-recipes-2.13.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\hadoop\hadoop-client-api\3.3.1\hadoop-client-api-3.3.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\hadoop\hadoop-client-runtime\3.3.1\hadoop-client-runtime-3.3.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\hive\hive-storage-api\2.7.2\hive-storage-api-2.7.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\htrace\htrace-core4\4.1.0-incubating\htrace-core4-4.1.0-incubating.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\ivy\ivy\2.5.0\ivy-2.5.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\orc\orc-core\1.6.11\orc-core-1.6.11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\orc\orc-mapreduce\1.6.11\orc-mapreduce-1.6.11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\orc\orc-shims\1.6.11\orc-shims-1.6.11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\parquet\parquet-column\1.12.1\parquet-column-1.12.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\parquet\parquet-common\1.12.1\parquet-common-1.12.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\parquet\parquet-encoding\1.12.1\parquet-encoding-1.12.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\parquet\parquet-format-structures\1.12.1\parquet-format-structures-1.12.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\parquet\parquet-hadoop\1.12.1\parquet-hadoop-1.12.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\parquet\parquet-jackson\1.12.1\parquet-jackson-1.12.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-catalyst_2.12\3.2.0\spark-catalyst_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-core_2.12\3.2.0\spark-core_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-kvstore_2.12\3.2.0\spark-kvstore_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-launcher_2.12\3.2.0\spark-launcher_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-network-common_2.12\3.2.0\spark-network-common_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-network-shuffle_2.12\3.2.0\spark-network-shuffle_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-sketch_2.12\3.2.0\spark-sketch_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-sql_2.12\3.2.0\spark-sql_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-tags_2.12\3.2.0\spark-tags_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\spark\spark-unsafe_2.12\3.2.0\spark-unsafe_2.12-3.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\xbean\xbean-asm9-shaded\4.20\xbean-asm9-shaded-4.20.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\yetus\audience-annotations\0.12.0\audience-annotations-0.12.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\zookeeper\zookeeper-jute\3.6.2\zookeeper-jute-3.6.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\apache\zookeeper\zookeeper\3.6.2\zookeeper-3.6.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\codehaus\janino\commons-compiler\3.0.16\commons-compiler-3.0.16.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\codehaus\janino\janino\3.0.16\janino-3.0.16.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\fusesource\leveldbjni\leveldbjni-all\1.8\leveldbjni-all-1.8.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\hk2\external\aopalliance-repackaged\2.6.1\aopalliance-repackaged-2.6.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\hk2\external\jakarta.inject\2.6.1\jakarta.inject-2.6.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\hk2\hk2-api\2.6.1\hk2-api-2.6.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\hk2\hk2-locator\2.6.1\hk2-locator-2.6.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\hk2\hk2-utils\2.6.1\hk2-utils-2.6.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\hk2\osgi-resource-locator\1.0.3\osgi-resource-locator-1.0.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\jersey\containers\jersey-container-servlet-core\2.34\jersey-container-servlet-core-2.34.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\jersey\containers\jersey-container-servlet\2.34\jersey-container-servlet-2.34.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\jersey\core\jersey-client\2.34\jersey-client-2.34.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\jersey\core\jersey-common\2.34\jersey-common-2.34.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\jersey\core\jersey-server\2.34\jersey-server-2.34.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\glassfish\jersey\inject\jersey-hk2\2.34\jersey-hk2-2.34.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\javassist\javassist\3.25.0-GA\javassist-3.25.0-GA.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\jetbrains\annotations\17.0.0\annotations-17.0.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\json4s\json4s-ast_2.12\3.7.0-M11\json4s-ast_2.12-3.7.0-M11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\json4s\json4s-core_2.12\3.7.0-M11\json4s-core_2.12-3.7.0-M11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\json4s\json4s-jackson_2.12\3.7.0-M11\json4s-jackson_2.12-3.7.0-M11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\json4s\json4s-scalap_2.12\3.7.0-M11\json4s-scalap_2.12-3.7.0-M11.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\lz4\lz4-java\1.7.1\lz4-java-1.7.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\objenesis\objenesis\2.5.1\objenesis-2.5.1.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\roaringbitmap\RoaringBitmap\0.9.0\RoaringBitmap-0.9.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\roaringbitmap\shims\0.9.0\shims-0.9.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\rocksdb\rocksdbjni\6.20.3\rocksdbjni-6.20.3.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\scala-lang\modules\scala-parser-combinators_2.12\1.1.2\scala-parser-combinators_2.12-1.1.2.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\scala-lang\modules\scala-xml_2.12\1.2.0\scala-xml_2.12-1.2.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\scala-lang\scala-library\2.12.15\scala-library-2.12.15.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\scala-lang\scala-reflect\2.12.15\scala-reflect-2.12.15.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\slf4j\jcl-over-slf4j\1.7.30\jcl-over-slf4j-1.7.30.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\slf4j\jul-to-slf4j\1.7.30\jul-to-slf4j-1.7.30.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\slf4j\slf4j-api\1.7.30\slf4j-api-1.7.30.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\slf4j\slf4j-log4j12\1.7.30\slf4j-log4j12-1.7.30.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\spark-project\spark\unused\1.0.0\unused-1.0.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\threeten\threeten-extra\1.5.0\threeten-extra-1.5.0.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\tukaani\xz\1.8\xz-1.8.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\org\xerial\snappy\snappy-java\1.1.8.4\snappy-java-1.1.8.4.jar;C:\Users\21279\AppData\Local\Coursier\cache\v1\https\repo1.maven.org\maven2\oro\oro\2.0.8\oro-2.0.8.jar Titanic Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 25/06/19 11:13:36 INFO SparkContext: Running Spark version 3.2.0 25/06/19 11:13:37 INFO ResourceUtils: ============================================================== 25/06/19 11:13:37 INFO ResourceUtils: No custom resources configured for spark.driver. 25/06/19 11:13:37 INFO ResourceUtils: ============================================================== 25/06/19 11:13:37 INFO SparkContext: Submitted application: Titanic 25/06/19 11:13:37 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 25/06/19 11:13:37 INFO ResourceProfile: Limiting resource is cpu 25/06/19 11:13:37 INFO ResourceProfileManager: Added ResourceProfile id: 0 25/06/19 11:13:37 INFO SecurityManager: Changing view acls to: 21279 25/06/19 11:13:37 INFO SecurityManager: Changing modify acls to: 21279 25/06/19 11:13:37 INFO SecurityManager: Changing view acls groups to: 25/06/19 11:13:37 INFO SecurityManager: Changing modify acls groups to: 25/06/19 11:13:37 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(21279); groups with view permissions: Set(); users with modify permissions: Set(21279); groups with modify permissions: Set() 25/06/19 11:13:38 INFO Utils: Successfully started service 'sparkDriver' on port 58426. 25/06/19 11:13:38 INFO SparkEnv: Registering MapOutputTracker 25/06/19 11:13:38 INFO SparkEnv: Registering BlockManagerMaster 25/06/19 11:13:38 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 25/06/19 11:13:38 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 25/06/19 11:13:38 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 25/06/19 11:13:39 INFO DiskBlockManager: Created local directory at C:\Users\21279\AppData\Local\Temp\blockmgr-d9717778-5c19-4b54-ba8c-bdc23a2bec0f 25/06/19 11:13:39 INFO MemoryStore: MemoryStore started with capacity 1985.4 MiB 25/06/19 11:13:39 INFO SparkEnv: Registering OutputCommitCoordinator 25/06/19 11:13:39 INFO Utils: Successfully started service 'SparkUI' on port 4040. 25/06/19 11:13:39 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://lxf:4040 25/06/19 11:13:39 INFO Executor: Starting executor ID driver on host lxf 25/06/19 11:13:39 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 58453. 25/06/19 11:13:39 INFO NettyBlockTransferService: Server created on lxf:58453 25/06/19 11:13:39 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 25/06/19 11:13:39 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, lxf, 58453, None) 25/06/19 11:13:39 INFO BlockManagerMasterEndpoint: Registering block manager lxf:58453 with 1985.4 MiB RAM, BlockManagerId(driver, lxf, 58453, None) 25/06/19 11:13:39 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, lxf, 58453, None) 25/06/19 11:13:39 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, lxf, 58453, None) 25/06/19 11:13:39 WARN SparkContext: Using an existing SparkContext; some configuration may not take effect. 25/06/19 11:13:40 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir. 25/06/19 11:13:40 INFO SharedState: Warehouse path is 'file:/D:/workspace/Maven/titanic_demo/spark-warehouse'. 25/06/19 11:13:41 WARN FileSystem: Failed to initialize fileystem hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv: java.io.IOException: Incomplete HDFS URI, no host: hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv 25/06/19 11:13:41 WARN FileStreamSink: Assume no metadata directory. Error while looking for metadata directory in the path: hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv. java.io.IOException: Incomplete HDFS URI, no host: hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:183) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.spark.sql.execution.streaming.FileStreamSink$.hasMetadata(FileStreamSink.scala:53) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:370) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188) at Titanic$.main(Titanic.scala:27) at Titanic.main(Titanic.scala) 25/06/19 11:13:41 WARN FileSystem: Failed to initialize fileystem hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv: java.io.IOException: Incomplete HDFS URI, no host: hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv Exception in thread "main" java.io.IOException: Incomplete HDFS URI, no host: hdfs://192.168.42:9000/user/hadoop/titanic/titanic.csv at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:183) at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469) at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) at org.apache.spark.sql.execution.datasources.DataSource$.$anonfun$checkAndGlobPathIfNecessary$1(DataSource.scala:747) at scala.collection.immutable.List.map(List.scala:293) at org.apache.spark.sql.execution.datasources.DataSource$.checkAndGlobPathIfNecessary(DataSource.scala:745) at org.apache.spark.sql.execution.datasources.DataSource.checkAndGlobPathIfNecessary(DataSource.scala:577) at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:408) at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:274) at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:245) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:245) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:188) at Titanic$.main(Titanic.scala:27) at Titanic.main(Titanic.scala) 25/06/19 11:13:41 INFO SparkContext: Invoking stop() from shutdown hook 25/06/19 11:13:41 INFO SparkUI: Stopped Spark web UI at http://lxf:4040 25/06/19 11:13:41 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 25/06/19 11:13:41 INFO MemoryStore: MemoryStore cleared 25/06/19 11:13:41 INFO BlockManager: BlockManager stopped 25/06/19 11:13:41 INFO BlockManagerMaster: BlockManagerMaster stopped 25/06/19 11:13:41 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 25/06/19 11:13:41 INFO SparkContext: Successfully stopped SparkContext 25/06/19 11:13:41 INFO ShutdownHookManager: Shutdown hook called 25/06/19 11:13:41 INFO ShutdownHookManager: Deleting directory C:\Users\21279\AppData\Local\Temp\spark-f92de141-507f-4f46-a675-bdf38dd1487f Process finished with exit code 1
最新发布
06-20
<think>我们面对的是两个主要问题:`UnknownHostException`(主机名无法解析)和`IncompleteHDFSURI`(HDFSURI不完整)。下面将分别针对这两个问题提供解决方案,并特别说明在Spark中如何正确配置HDFS路径。---###**一、`java.net.UnknownHostException:node01`解决方案**此错误表明Spark(或Hadoop客户端)无法解析主机名`node01`。根本原因是操作系统或Hadoop配置中缺少主机名到IP的映射。####解决步骤:1.**检查主机名解析**在运行Spark应用的机器上(包括集群所有节点)执行:```bashpingnode01```-若无法解析,需在`/etc/hosts`文件中添加映射(需root权限):```bash#编辑hosts文件sudovim/etc/hosts```添加行(IP替换为实际值):```text192.168.1.100node01#确保此处IP是HDFSNameNode的真实IP```2.**验证Hadoop配置**检查`core-site.xml`中`fs.defaultFS`的配置:```xml<property><name>fs.defaultFS</name><value>hdfs://node01:9000</value><!--主机名必须可解析--></property>```-若使用IP可避免DNS问题(临时方案):```xml<value>hdfs://192.168.1.100:9000</value>```3.**在Spark代码中指定Hadoop配置**若无法修改系统配置,可在Spark初始化时传入:```scalavalspark=SparkSession.builder().appName("TitanicAnalysis").config("spark.hadoop.fs.defaultFS","hdfs://node01:9000")//显式加载Hadoop配置(可选).config("spark.hadoop.fs.hdfs.impl","org.apache.hadoop.hdfs.DistributedFileSystem").getOrCreate()```---###**二、`IncompleteHDFSURI`解决方案**此错误通常因HDFS路径格式错误引起。正确格式为:`hdfs://<host>:<port>/path`。常见错误:-**错误示例**:`hdfs:///user/data`(缺少主机名和端口)-**错误示例**:`hdfs://node01:9000:9000/user/data`(重复端口)####修复方法:1.**在Spark代码中指定完整URI**```scala//正确写法valdf=spark.read.csv("hdfs://node01:9000/user/hadoop/titanic/titanic.csv")```或使用默认文件系统(需已配置`fs.defaultFS`):```scalavaldf=spark.read.csv("/user/hadoop/titanic/titanic.csv")//自动补全为hdfs://node01:9000/...```2.**检查环境变量**确保`HADOOP_CONF_DIR`指向包含`core-site.xml`的目录:```bashexportHADOOP_CONF_DIR=/etc/hadoop/conf#实际路径根据集群设置```---###**三、通用验证步骤**1.**命令行测试HDFS连通性**```bashhadoopfs-lshdfs://node01:9000/user#使用完整URI```或依赖默认配置:```bashhadoopfs-ls/user#要求core-site.xml已配置```2.**检查端口开放**```bashtelnetnode019000#确保网络可达```---###**四、高级场景:Flink集成HDFS**若在Flink中遇到类似问题(参考引用[4]),需显式初始化文件系统:```javaimportorg.apache.hadoop.conf.Configuration;importorg.apache.hadoop.fs.FileSystem;//在Flink主函数中初始化Configurationconf=newConfiguration();conf.set("fs.defaultFS","hdfs://node01:9000");FileSystem.initialize(conf);//强制加载配置```>**关键点总结**:>1.主机名解析是基础(`/etc/hosts`或DNS)>2.HDFSURI必须完整或依赖正确配置的默认文件系统>3.跨组件(如Flink)需主动初始化Hadoop文件系统[^1]:HDFSURI格式错误示例[^2]:端口重复配置导致URI无效[^3]:Cloudera社区关于主机名解析的解决方案[^4]:Flink任务提交架构说明---§§相关问题§§1.Spark如何动态加载Hadoop配置文件?2.如何为Hadoop集群配置高可用(HA)的HDFS?3.在Docker容器中运行Spark时如何解决HDFS连接问题?4.Hadoop的`core-site.xml`和`hdfs-site.xml`中有哪些关键参数?5.FlinkonYARN模式下如何正确访问HDFS?</think>### 解决 Spark 中 HDFS 路径配置错误导致的 `UnknownHostException` 和 `Incomplete HDFS URI` 问题 #### 一、`java.net.UnknownHostException: node01` 解决方案 **根本原因**:Spark 无法解析 HDFS 主机名 `node01` **解决步骤**: 1. **验证主机名解析** 在 Spark 客户端机器执行: ```bash ping node01 nslookup node01 ``` 若失败,编辑 `/etc/hosts`(Linux/Mac)或 `C:\Windows\System32\drivers\etc\hosts`(Windows): ```text # 添加映射(IP替换为实际NameNode IP) 192.168.1.100 node01 ``` 2. **检查 Hadoop 配置一致性** 确保 `core-site.xml` 中的配置与 Spark 代码一致: ```xml <!-- core-site.xml 内容 --> <property> <name>fs.defaultFS</name> <value>hdfs://node01:9000</value> <!-- 此处主机名必须可解析 --> </property> ``` 3. **在 Spark 代码中显式指定配置** ```scala val spark = SparkSession.builder() .appName("TitanicAnalysis") .config("spark.hadoop.fs.defaultFS", "hdfs://node01:9000") // 显式覆盖默认配置 .getOrCreate() ``` #### 二、`Incomplete HDFS URI` 解决方案 **根本原因**:HDFS URI 格式错误(缺少主机名或端口重复) **正确格式**:`hdfs://<host>:<port>/path` **修复示例**: ```scala // 错误示例(缺少主机名) spark.read.csv("hdfs:///user/data") // 正确示例(完整URI) spark.read.csv("hdfs://node01:9000/user/hadoop/titanic/titanic.csv") // 或使用已配置的默认FS(需fs.defaultFS已设置) spark.read.csv("/user/hadoop/titanic/titanic.csv") ``` #### 三、通用验证步骤 1. **命令行测试连通性** ```bash hadoop fs -ls hdfs://node01:9000/user # 测试完整URI hadoop fs -ls /user # 测试默认文件系统 ``` 2. **检查端口冲突** 确认端口未重复(如 `hdfs://host:9000:9000` 是错误的): ```bash netstat -tuln | grep 9000 # 检查端口监听状态 ``` 3. **防火墙检查** 确保 Spark 客户端可访问 HDFS 端口: ```bash telnet node01 9000 # 测试端口连通性 ``` #### 四、配置总结表 | 问题类型 | 检查点 | 正确示例 | |---------|--------|----------| | `UnknownHostException` | 主机名解析 | `/etc/hosts` 包含 `192.168.1.100 node01` | | `Incomplete HDFS URI` | URI 格式 | `hdfs://node01:9000/path` | | 端口冲突 | 端口唯一性 | 避免 `hdfs://host:9000:9000` | | 防火墙阻挡 | 端口可达 | `telnet node01 9000` 成功 | > **关键提示**: > 1. 生产环境建议使用 **FQDN**(全限定域名)而非简单主机名 > 2. 跨平台部署时注意 Windows 的 hosts 文件路径差异 > 3. 使用 `hdfs dfsadmin -report` 验证 HDFS 状态[^3] ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值