Spark读取hdfs文件原理

最新推荐文章于 2024-11-14 13:27:23 发布

qq_43193797

最新推荐文章于 2024-11-14 13:27:23 发布

阅读量876

点赞数

分类专栏： spark

本文链接：https://blog.youkuaiyun.com/qq_43193797/article/details/111687457

版权

spark 专栏收录该内容

45 篇文章

订阅专栏

var sconf = new SparkConf().setAppName(this.getClass.getName).setMaster("yarn")
var sc = new SparkContext(sconf)
sc.textFile("hdfs://m2:9820/README.md")

查看textFile方法内容如下：

  /**
   * Read a text file from HDFS, a local file system (available on all nodes), or any
   * Hadoop-supported file system URI, and return it as an RDD of Strings.
   */
  def textFile(
      path: String,
      minPartitions: Int = defaultMinPartitions): RDD[String] = withScope {
    assertNotStopped()
    hadoopFile(path, classOf[TextInputFormat], classOf[LongWritable], classOf[Text],
      minPartitions).map(pair => pair._2.toString).setName(path)
  }