**
一、IDEA的准备工作
**
IDEA + Maven
jar包挑出来,放到lib下,然后加到classpath
jar冲突
能够直接支持eclipse的支持
Maven:方便去中央仓库下载jar 要求你们能够联网
对于Apache的顶级项目
Hadoop Hive Spark Flink Maven
maven.apache.org
源码:github.com/apache/spark
Maven安装
jdk:windows机器开发,下载一个windows的jdk exe
下载:apache-maven-3.6.3-bin.tar.gz
http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.tar.gz
apache-maven-3.6.3-bin.zip
http://mirrors.tuna.tsinghua.edu.cn/apache/maven/maven-3/3.6.3/binaries/apache-maven-3.6.3-bin.zip
解压:
note:
不要给我放在中文路径下
不要给我放在带空格的路径下
修改配置文件
conf/setting.xml
默认:
Default: ${user.home}/.m2/repository
<localRepository>/path/to/local/repo</localRepository>
${user.home}:一般是在C盘
建议你们要改:
1)C盘空间有限的
2)重装了系统
==>
Windows:
<localRepository>D:\\software\\maven_repository</localRepository>
Linux:
<localRepository>/home/hadoop/maven_repos</localRepository>
IDEA:exe
IDEA+Maven整合
File==>Setting 搜索maven ==> 选择我们刚刚本地的maven conf/setting
如果你们以前是使用Eclipse,IDEA如何设置成和Eclipse一样的快捷键呢
File==>Setting==>keymap选择Eclipse
如何使用IDEA+Maven来创建一个Java项目呢
File==>new==>Project
填写GAV 是maven去寻找/下载我们所需要依赖的标识符
src
main 开发代码存放的地方
java 存放java代码相关的
scala 存放scala代码相关的
test 测试代码存放的地方 testcase
pom.xml
Maven依赖查询:https://mvnrepository.com/
关于配置信息相关的说明
官方提供的配置信息:
core-default.xml
hdfs-default.xml
mapred-default.xml
yarn-default.xml
服务器上的配置信息
core-site.xml
hdfs-site.xml
mapred-site.xml
yarn-site.xml
客户端也是可以修改的
configuration设置进去
copy vs mv
IO
字节流
InputStream
OutputStream
字符流
Reader
Writer
单元测试框架 junit
开发代码
Cal
开发测试代码
CalTest
对于你要进行测试的方法,需要使用@Test这个注解
Assert.assertEquals(expected 5, result);
@Before和@After分别是在每个测试方式执行前后执行
一个测试方法执行一次@Before和@After
@BeforeClass和@AfterClass是在所有测试方法执行前后执行
只执行一次@BeforeClass和@AfterClass
@BeforeClass和@AfterClass定义的方式必须是static
二、Hadoop的API编程
pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.ccj.pxj.api</groupId>
<artifactId>HadoopApi</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<maven.compiler.source>1.8</maven.compiler.source>
<maven.compiler.target>1.8</maven.compiler.target>
<hadoop.version>2.6.0-cdh5.16.2</hadoop.version>
</properties>
<repositories>
<repository>
<id>cloudera</id>
<url>https://repository.cloudera.com/artifactory/cloudera-repos/</url>
</repository>
</repositories>
<dependencies>
<!-- 添加Hadoop依赖 -->
<!-- https://mvnrepository.com/artifact/org.apache.hadoop/hadoop-client -->
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>${hadoop.version}</version>
</dependency>
<!-- https://mvnrepository.com/artifact/junit/junit -->
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.11</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
</dependency>
</dependencies>
<build>
<pluginManagement><!-- lock down plugins versions to avoid using Maven defaults (may be moved to parent pom) -->
<plugins>
<!-- clean lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#clean_Lifecycle -->
<plugin>
<artifactId>maven-clean-plugin</artifactId>
<version>3.1.0</version>
</plugin>
<!-- default lifecycle, jar packaging: see https://maven.apache.org/ref/current/maven-core/default-bindings.html#Plugin_bindings_for_jar_packaging -->
<plugin>
<artifactId>maven-resources-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.0</version>
</plugin>
<plugin>
<artifactId>maven-surefire-plugin</artifactId>
<version>2.22.1</version>
</plugin>
<plugin>
<artifactId>maven-jar-plugin</artifactId>
<version>3.0.2</version>
</plugin>
<plugin>
<artifactId>maven-install-plugin</artifactId>
<version>2.5.2</version>
</plugin>
<plugin>
<artifactId>maven-deploy-plugin</artifactId>
<version>2.8.2</version>
</plugin>
<!-- site lifecycle, see https://maven.apache.org/ref/current/maven-core/lifecycles.html#site_Lifecycle -->
<plugin>
<artifactId>maven-site-plugin</artifactId>
<version>3.7.1</version>
</plugin>
<plugin>
<artifactId>maven-project-info-reports-plugin</artifactId>
<version>3.0.0</version>
</plugin>
</plugins>
</pluginManagement>
</build>
</project>
API编程
package com.pxj.ccj.api;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.*;
import org.apache.hadoop.fs.permission.FsPermission;
import org.apache.hadoop.io.IOUtils;
import org.junit.After;
import org.junit.Before;
import org.junit.Test;
import java.io.*;
import java.net.URI;
/**
* 作者:pxj
*/
public class HDFSAPITest {
FileSystem fileSystem;
URI uri;
/**
* 在Before中完成FileSystem的初始化操作
*/
@Before
public void setUp(){
try{
uri=new URI("hdfs://pxj:9000");
Configuration configuration = new Configuration();
configuration.set("dfs.client.use.datanode.hostname","true");
configuration.set("dfs.replication","1");
fileSystem = FileSystem.get(uri, configuration,"pxj");
}catch (Exception e){
e.printStackTrace();
}
}
/**
* 在After方法中完成FileSystem的关闭操作
*/
@After
public void tearDown()throws Exception {
if(null != fileSystem) {
fileSystem.close();
}
}
@Test
public void mkdir() {
try {
Configuration configuration = new Configuration();
//获取客户端
FileSystem fileSystem = FileSystem.get(uri, configuration, "pxj");
//创建文件
fileSystem.mkdirs(new Path("/hdfsapiv1"));
}catch (Exception e){
e.printStackTrace();
}finally {
if (null!=fileSystem){
try {
fileSystem.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
}
@Test
public void mkdir2(){
try {
fileSystem.mkdirs(new Path("/test3"));
}catch (Exception e){
}finally {
try {
fileSystem.close();
} catch (IOException e) {
e.printStackTrace();
}
}
}
/**
* 上传文件
*/
@Test
public void copyFromLocalFile(){
try {
Path src=new Path("data/1.txt");
Path dst=new Path("/test3");
fileSystem.copyFromLocalFile(src,dst);
}catch (Exception e){
e.printStackTrace();
}finally {
clone(fileSystem);
}
}
/**
* 下载文件
*/
@Test
public void copyToLocalFile(){
try{
Path src=new Path("/test3/1.txt");
Path dst=new Path("output/1.txt");
fileSystem.copyToLocalFile(src,dst);
}catch (Exception e){
}finally {
clone(fileSystem);
}
}
/**
* 重命名
*/
@Test
public void rename(){
try {
Path src=new Path("/test3/1.txt");
Path dst=new Path("/test3/1-rename.txt");
fileSystem.rename(src,dst);
}catch (Exception e){
e.printStackTrace();
}
}
@Test
public void listFile(){
try{
RemoteIterator<LocatedFileStatus> files = fileSystem.listFiles(new Path("/test3"), true);
while (files.hasNext()){
LocatedFileStatus fileStatus = files.next();
String path = fileStatus.getPath().toString();
long len = fileStatus.getLen();
FsPermission permission = fileStatus.getPermission();
short replication = fileStatus.getReplication();
String isDir= fileStatus.isDirectory() ? "文件夹":"文件";
System.out.println(path + "\t" + len + "\t" + replication + "\t" + permission + "\t" + isDir);
BlockLocation[] blockLocations = fileStatus.getBlockLocations();
for (BlockLocation location : blockLocations) {
String[] hosts = location.getHosts();
for (String host : hosts) {
System.out.println(host + "................");
}
}
}
}catch (Exception e){
e.printStackTrace();
}finally {
clone(fileSystem);
}
}
/**
* 删除文件
*/
@Test
public void dele(){
try {
fileSystem.delete(new Path("hdfsapiv1"));
}catch (Exception e){
e.printStackTrace();
}finally {
clone(fileSystem);
}
}
/**
* 上传
*/
@Test
public void copyFromLocalFileIo(){
try{
BufferedInputStream in = new BufferedInputStream(new FileInputStream(new File("data/2.txt")));
FSDataOutputStream out = fileSystem.create(new Path("/test3/pxj.txt"));
// In=>out
IOUtils.copyBytes(in,out,1024);
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}catch (Exception e){
e.printStackTrace();
}finally {
clone(fileSystem);
}
}
@Test
public void copyToLocalFileIO01() {
FileOutputStream out=null;
FSDataInputStream in=null;
try {
// 2 block
in = fileSystem.open(new Path("/jdk1.8.0_121.zip"));
/**
* 需求:把每个block给单独下载下来
*/
out = new FileOutputStream(new File("output/1.zip"));
byte[] buffer = new byte[1024];
// 0 - 128M
for(int i=0; i<1024*128; i++){
in.read(buffer);
out.write(buffer);
}
}catch (Exception e){
e.printStackTrace();
}finally {
IOUtils.closeStream(in);
IOUtils.closeStream(out);
clone(fileSystem);
}
}
@Test
public void copyToLocalFileIO02() throws Exception {
// 3 block
FSDataInputStream in = fileSystem.open(new Path("/jdk1.8.0_121.zip"));
FileOutputStream out = new FileOutputStream(new File("output/2.zip"));
// 设置指定读取的offset
in.seek(1024*1024*128);
byte[] buffer = new byte[1024];
// 128M - 256M
for(int i=0; i<1024*128; i++){ // 只包括128
in.read(buffer);
out.write(buffer);
}
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
@Test
public void copyToLocalFileIO03() throws Exception {
// 3 block
FSDataInputStream in = fileSystem.open(new Path("/jdk1.8.0_121.zip"));
FileOutputStream out = new FileOutputStream(new File("output/jdk.tgz.part2"));
// 设置指定读取的offset 256M
in.seek(1024*1024*128);
IOUtils.copyBytes(in, out, 1024);
IOUtils.closeStream(in);
IOUtils.closeStream(out);
}
/**
* 关闭
* @param fileSystem
*/
public void clone(FileSystem fileSystem){
try {
if(fileSystem!=null){
fileSystem.close();
}
}catch (Exception e){
e.printStackTrace();
}
}
}
作者:潘陈(pxj)
335

被折叠的 条评论
为什么被折叠?



