【Hadoop源码学习】之hdfs（一）_fs.copyfromlocalfile-优快云博客

本文链接：https://blog.youkuaiyun.com/messiran10/article/details/50890898

本文主要介绍了如何使用Hadoop的FileSystem API操作HDFS数据，通过示例程序展示了读取文件的过程，并深入解析了源码，涉及Configuration、FileSystem、FileStatus等关键类。通过对DistributedFileSystem和DFSInputStream的open方法的分析，揭示了HDFS与NameNode的交互机制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

前言：之前已经基本熟悉了常用的机器学习算法，现在开始学习Hadoop/Spark这些分布式的框架。目前Spark的发展态势比较好，但是Spark也是基于Hadoop的分布式文件系统（hdfs）实现的。所以准备先好好学学hadoop，再去学习Spark。

Hdfs的框架参见http://www.cnblogs.com/laov/p/3434917.html

Hdfs提供了一些JAVA API供用户使用，因此就从这些API的实现机制入手去学习hadoop hdfs部分的程序。

使用FileSystem API操作数据：

import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FSDataOutputStream;
import org.apache.hadoop.fs.FileStatus;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IOUtils;

public class FileSystemCat {

    //创建新文件
    public static void createFile(String dst , byte[] contents) throws IOException{
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path dstPath = new Path(dst); //目标路径
        //打开一个输出流
        FSDataOutputStream outputStream = fs.create(dstPath);
        outputStream.write(contents);
        outputStream.close();
        fs.close();
        System.out.println("文件创建成功！");
    }

    //上传本地文件
    public static void uploadFile(String src,String dst) throws IOException{
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path srcPath = new Path(src); //原路径
        Path dstPath = new Path(dst); //目标路径
        //调用文件系统的文件复制函数,前面参数是指是否删除原文件，true为删除，默认为false
        fs.copyFromLocalFile(false,srcPath, dstPath);

        //打印文件路径
        System.out.println("Upload to "+conf.get("fs.default.name"));
        System.out.println("------------list files------------"+"\n");
        FileStatus [] fileStatus = fs.listStatus(dstPath);
        for (FileStatus file : fileStatus) 
        {
            System.out.println(file.getPath());
        }
        fs.close();
    }

    //文件重命名
    public static void rename(String oldName,String newName) throws IOException{
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path oldPath = new Path(oldName);
        Path newPath = new Path(newName);
        boolean isok = fs.rename(oldPath, newPath);
        if(isok){
            System.out.println("rename ok!");
        }else{
            System.out.println("rename failure");
        }
        fs.close();
    }
    //删除文件
    public static void delete(String filePath) throws IOException{
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path path = new Path(filePath);
        boolean isok = fs.deleteOnExit(path);
        if(isok){
            System.out.println("delete ok!");
        }else{
            System.out.println("delete failure");
        }
        fs.close();
    }

    //创建目录
    public static void mkdir(String path) throws IOException{
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path srcPath = new Path(path);
        boolean isok = fs.mkdirs(srcPath);
        if(isok){
            System.out.println("create dir ok!");
        }else{
            System.out.println("create dir failure");
        }
        fs.close();
    }

    //读取文件的内容
    public static void readFile(String filePath) throws IOException{
        Configuration conf = new Configuration();
        FileSystem fs = FileSystem.get(conf);
        Path srcPath = new Path(filePath);
        InputStream in = null;
        try {
            in = fs.open(srcPath);
            IOUtils.copyBytes(in, System.out, 4096, false); //复制到标准输出流
        } finally {
            IOUtils.closeStream(in);
        }
    }


    public static void main(String[] args) throws IOException {
        //测试上传文件
        //uploadFile("/root/hadoop/release-0.20.2/input/test1.txt", "/user/root/in");
        //测试创建文件
        byte[] contents =  "hello world 世界你好\n".getBytes();
        createFile("/user/root/in/d.txt",contents);
        //测试重命名
        //rename("/user/hadoop/test/d.txt", "/user/hadoop/test/dd.txt");
        //测试删除文件
        //delete("test/dd.txt"); //使用相对路径
        //delete("test1");    //删除目录
        //测试新建目录
        //mkdir("test1");
        //测试读取文件
        //readFile("hdfs://localhost:9000/user/root/in/test1.txt");
    }

}

上述是利用FileSystem API 操作数据的示例程序。从上面的程序可以看出，hdfs的FileSystem 类提供了较为完备的方法。利用这些方法（API）可以对hdfs上（或者说云上）的文件进行类似于本地文件的一些操作。

上述API主要是利用了以下hadoop源码实现的类：

Configuration
FileSystem
FileStatus

//读取文件的内容
public static void readFile(String filePath) throws IOException{
Configuration conf = new Configuration();
FileSystem fs = FileSystem.get(conf);
Path srcPath = new Path(filePath);
InputStream in = null;
try {
in = fs.open(srcPath);
IOUtils.copyBytes(in, System.out, 4096, false); //复制到标准输出流
} finally {
IOUtils.closeStream(in);
}
}
上面的程序是利用filesystem API读取hdfs中文件的示例程序。
Configuration conf = new Configuration();
——- 获取客户端或服务器的配置，默认情况下是读取conf/core-site.xml,conf/hdfs-site.xml等配置文件。
FileSystem fs = FileSystem.get(conf);
——–返回一个FileSystem的引用。FileSystem是抽象类，无法使用其来实例化对象。这里的fs指向的应该是一个具体类（如DistributedFileSystem）的对象，应用了java里的多态技术？。
in = fs.open(srcPath);
——-返回一个输入流对象，这个函数是我们要分析的重点。
—- DistributedFileSystem对象的open方法
public FSDataInputStream open(Path f, int bufferSize) throws IOException {
statistics.incrementReadOps(1);
return new DFSClient.DFSDataInputStream(
dfs.open(getPathName(f), bufferSize, verifyChecksum, statistics));
}
—— dfs对象的open方法，返回一个DFSInputStream对象。这个对象的构造方法里面会跟NameNode进行交互

public DFSInputStream open(String src, int buffersize, boolean verifyChecksum,
FileSystem.Statistics stats
) throws IOException {
checkOpen();
// Get block info from namenode
return new DFSInputStream(src, buffersize, verifyChecksum);
}