hadoop的API编程

最新推荐文章于 2022-07-01 14:38:21 发布

原创最新推荐文章于 2022-07-01 14:38:21 发布 · 654 阅读

CC 4.0 BY-SA版权

一、IDEA的准备工作

IDEA + Maven

    jar包挑出来，放到lib下，然后加到classpath
    jar冲突
    能够直接支持eclipse的支持
    Maven：方便去中央仓库下载jar 要求你们能够联网

对于Apache的顶级项目
    Hadoop Hive Spark Flink Maven

    maven.apache.org

    源码：github.com/apache/spark

Maven安装

    jdk：windows机器开发，下载一个windows的jdk exe
    下载：apache-maven-3.6.3-bin.tar.gz
            http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
        apache-maven-3.6.3-bin.zip
            http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.zip

     解压：
        note：
            不要给我放在中文路径下
            不要给我放在带空格的路径下
     修改配置文件
        conf/setting.xml
            默认：
                  Default: ${user.home}/.m2/repository
                  <localRepository>/path/to/local/repo</localRepository>
                  ${user.home}：一般是在C盘
                  建议你们要改：
                    1）C盘空间有限的
                    2）重装了系统
                  ==>
                  Windows:
                    <localRepository>D:\\software\\maven_repository</localRepository>
                  Linux：
                    <localRepository>/home/hadoop/maven_repos</localRepository>

IDEA：exe

IDEA+Maven整合
    File==>Setting 搜索maven ==> 选择我们刚刚本地的maven conf/setting

如果你们以前是使用Eclipse，IDEA如何设置成和Eclipse一样的快捷键呢
    File==>Setting==>keymap选择Eclipse

如何使用IDEA+Maven来创建一个Java项目呢
File==>new==>Project
填写GAV 是maven去寻找/下载我们所需要依赖的标识符

src
main 开发代码存放的地方
java 存放java代码相关的
scala 存放scala代码相关的

test 测试代码存放的地方 testcase

pom.xml

Maven依赖查询：https://mvnrepository.com/
关于配置信息相关的说明
官方提供的配置信息：
core-default.xml
hdfs-default.xml
mapred-default.xml
yarn-default.xml

服务器上的配置信息
    core-site.xml
    hdfs-site.xml
    mapred-site.xml
    yarn-site.xml

客户端也是可以修改的
    configuration设置进去

copy vs mv

IO
字节流
InputStream
OutputStream
字符流
Reader
Writer

单元测试框架 junit

开发代码
    Cal
开发测试代码
    CalTest
    对于你要进行测试的方法，需要使用@Test这个注解
    Assert.assertEquals(expected 5, result);

    @Before和@After分别是在每个测试方式执行前后执行
        一个测试方法执行一次@Before和@After
    @BeforeClass和@AfterClass是在所有测试方法执行前后执行
        只执行一次@BeforeClass和@AfterClass
        @BeforeClass和@AfterClass定义的方式必须是static

二、Hadoop的API编程

pom.xml

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.ccj.pxj.api</groupId>
    <artifactId>HadoopApi</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
        <hadoop.version>2.6.0-cdh5.16.2</hadoop.version>
    </properties>
    <repositories>
        <repository>
            <id>cloudera</id>
            <url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
        </repository>
    </repositories>
<dependencies>
    <!-- 添加Hadoop依赖 -->
    <!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-client</artifactId>
        <version>${hadoop.version}</version>
    </dependency>
    <!-- https://mvnrepository.com/artifact/junit/junit -->
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.11</version>
        <scope>test</scope>
    </dependency>
    <dependency>
        <groupId>junit</groupId>
        <artifactId>junit</artifactId>
        <version>4.12</version>
    </dependency>
</dependencies>
    <build>
        <pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
            <plugins>
                <!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
                <plugin>
                    <artifactId>maven-clean-plugin</artifactId>
                    <version>3.1.0</version>
                </plugin>
                <!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
                <plugin>
                    <artifactId>maven-resources-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-compiler-plugin</artifactId>
                    <version>3.8.0</version>
                </plugin>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <version>2.22.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-jar-plugin</artifactId>
                    <version>3.0.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-install-plugin</artifactId>
                    <version>2.5.2</version>
                </plugin>
                <plugin>
                    <artifactId>maven-deploy-plugin</artifactId>
                    <version>2.8.2</version>
                </plugin>
                <!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
                <plugin>
                    <artifactId>maven-site-plugin</artifactId>
                    <version>3.7.1</version>
                </plugin>
                <plugin>
                    <artifactId>maven-project-info-reports-plugin</artifactId>
                    <version>3.0.0</version>
                </plugin>
            </plugins>
        </pluginManagement>
    </build>
</project>

API编程

package com.pxj.ccj.api;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.permission.FsPermission;
import org.apache.hadoop.io.IOUtils;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.*;
import java.net.URI;
/**
 * 作者：pxj
 */
public class HDFSAPITest {
    FileSystem fileSystem;
    URI uri;
    /**
     *  在Before中完成FileSystem的初始化操作
     */
    @Before
    public void setUp(){
        try{
            uri=new URI("hdfs://pxj:9000");
            Configuration configuration = new Configuration();
            configuration.set("dfs.client.use.datanode.hostname","true");
            configuration.set("dfs.replication","1");
            fileSystem = FileSystem.get(uri, configuration,"pxj");
        }catch (Exception e){
            e.printStackTrace();
        }
    }
    /**
     * 在After方法中完成FileSystem的关闭操作
     */
    @After
    public void tearDown()throws Exception {
        if(null != fileSystem) {
            fileSystem.close();
        }
    }
    @Test
    public  void mkdir() {
        try {
            Configuration configuration = new Configuration();
            //获取客户端
            FileSystem fileSystem = FileSystem.get(uri, configuration, "pxj");
            //创建文件
            fileSystem.mkdirs(new Path("/hdfsapiv1"));
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            if (null!=fileSystem){
                try {
                    fileSystem.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }
    }
    @Test
    public  void mkdir2(){
        try {
            fileSystem.mkdirs(new Path("/test3"));
        }catch (Exception e){
        }finally {
            try {
                fileSystem.close();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }
    /**
     * 上传文件
     */
    @Test
    public  void copyFromLocalFile(){
        try {
            Path src=new Path("data/1.txt");
            Path dst=new Path("/test3");
            fileSystem.copyFromLocalFile(src,dst);
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            clone(fileSystem);
        }
    }
    /**
     * 下载文件
     */
    @Test
    public  void copyToLocalFile(){
        try{
            Path src=new Path("/test3/1.txt");
            Path dst=new Path("output/1.txt");
            fileSystem.copyToLocalFile(src,dst);
        }catch (Exception e){
        }finally {
            clone(fileSystem);
        }
    }
    /**
     * 重命名
     */
    @Test
    public void rename(){
        try {
            Path src=new Path("/test3/1.txt");
            Path dst=new Path("/test3/1-rename.txt");
            fileSystem.rename(src,dst);
        }catch (Exception e){
            e.printStackTrace();
        }
    }
@Test
public void listFile(){
        try{
            RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(new Path("/test3"), true);
            while (files.hasNext()){
                LocatedFileStatus fileStatus = files.next();
                String path = fileStatus.getPath().toString();
                long len = fileStatus.getLen();
                FsPermission permission = fileStatus.getPermission();
                short replication = fileStatus.getReplication();
                String isDir= fileStatus.isDirectory() ? "文件夹":"文件";
                System.out.println(path + "\t" + len + "\t" + replication + "\t" + permission + "\t" + isDir);
                BlockLocation[] blockLocations = fileStatus.getBlockLocations();
                for (BlockLocation location : blockLocations) {
                    String[] hosts = location.getHosts();
                    for (String host : hosts) {
                        System.out.println(host + "................");
                    }
                }
            }
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            clone(fileSystem);
        }
}
    /**
     * 删除文件
     */
    @Test
    public  void dele(){
        try {
           fileSystem.delete(new Path("hdfsapiv1"));
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            clone(fileSystem);
        }
    }
    /**
     * 上传
     */
    @Test
    public  void copyFromLocalFileIo(){
        try{
            BufferedInputStream in = new BufferedInputStream(new FileInputStream(new File("data/2.txt")));
            FSDataOutputStream out = fileSystem.create(new Path("/test3/pxj.txt"));
//            In=>out
            IOUtils.copyBytes(in,out,1024);
            IOUtils.closeStream(in);
            IOUtils.closeStream(out);
        }catch (Exception e){
            e.printStackTrace();
        }finally {
            clone(fileSystem);
        }
    }
    @Test
    public void copyToLocalFileIO01() {
        FileOutputStream out=null;
        FSDataInputStream in=null;
        try {
        // 2 block
         in = fileSystem.open(new Path("/jdk1.8.0_121.zip"));
        /**
         * 需求：把每个block给单独下载下来
         */
        out = new FileOutputStream(new File("output/1.zip"));
        byte[] buffer = new byte[1024];
        // 0 - 128M
        for(int i=0; i<1024*128; i++){
            in.read(buffer);
            out.write(buffer);
        }
        }catch (Exception e){
           e.printStackTrace();
        }finally {
            IOUtils.closeStream(in);
            IOUtils.closeStream(out);
            clone(fileSystem);
        }
    }
    @Test
    public void copyToLocalFileIO02() throws Exception {
        // 3 block
        FSDataInputStream in = fileSystem.open(new Path("/jdk1.8.0_121.zip"));
        FileOutputStream out = new FileOutputStream(new File("output/2.zip"));
        // 设置指定读取的offset
        in.seek(1024*1024*128);
        byte[] buffer = new byte[1024];
        // 128M - 256M
        for(int i=0; i<1024*128; i++){ // 只包括128
            in.read(buffer);
            out.write(buffer);
        }
        IOUtils.closeStream(in);
        IOUtils.closeStream(out);
    }
    @Test
    public void copyToLocalFileIO03() throws Exception {
        // 3 block
        FSDataInputStream in = fileSystem.open(new Path("/jdk1.8.0_121.zip"));
        FileOutputStream out = new FileOutputStream(new File("output/jdk.tgz.part2"));
        // 设置指定读取的offset  256M
        in.seek(1024*1024*128);
        IOUtils.copyBytes(in, out, 1024);
        IOUtils.closeStream(in);
        IOUtils.closeStream(out);
    }
    /**
     * 关闭
     * @param fileSystem
     */
 public void clone(FileSystem fileSystem){
        try {
            if(fileSystem!=null){
                fileSystem.close();
            }
        }catch (Exception e){
            e.printStackTrace();
        }
 }
}