Hadoop学习之-Avro

本文深入探讨了Apache Avro的特点、数据类型、基于Schema的读写、数据文件操作以及Schema解析。Avro提供独立于语言的数据序列化系统,支持丰富的数据类型,快速可压缩的二进制格式,适用于跨语言交互和RPC。通过实例演示了Python与Java间的数据文件互读互写,并详细解释了不同Schema在读写操作中的灵活应用。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

关于Avro

Apache Avro是一个独立于编程语言的数据化序列系统。该项目由Hadoop之父Doug Cutting创建,旨在解决Hadoop中Writable类型的不足:缺乏语言的可移植性。

1.Avro的特点

  1. 独立于编程语言。
  2. 丰富的数据结构类型。
  3. 快速的可压缩的二进制数据形式。(对MapReduce的输入格式至关重要。)
  4. Avro可用于RPC。

2.Avro数据类型和Schema

2-1.基本数据类型

类型描述schema示例
null空值“null”
boolean二进制值“boolean”
int32位有符号整数“int”
long64位有符号整数“long”
float单精度(32位)IEEE 754浮点数“float”
double双精度(32位)IEEE 754浮点数“double”
bytes8位无符号字节序列“bytes”
stringUnicode字符序列“string”

2-2.Avro复杂数据类型

类型描述schema实例
array已排序的对象集合。{
“type”: “array”,
“items”: “long”
}
map未排序的键值对。key必须是string
record一个任意类型的命名字段集合{
“type”: “record”,
“name”:“WeatherRecord”,
“doc”: “A weather reading.”,
“fields”: [
{
“name”:“year”,
“type”:“int”
},
{
“name”:“temperature”,
“type”: “int”
},
{
“name”:“stationId”,
“type”: “string”
]
}
ENUM枚举(一组命名值的集合){
“type”: “enum”,
“name”:“Cutlery”,
“doc”: “An eating utensil.”,
“symbols”:
[“KNIFE”, “FORK”,“SPOON”]
}
fixed一组固定数量的8位无符号字节{
“type”: “fixed”,
“name”:“Md5Hash”,
“size”: 16
}
unionschema的并集(并集可以用JSON数组表示)[
“null”,
“string”,
{
“type”: “map”,
“values”: “string”
}
]

2-3.Avro数据的读/写

2-3-1.基于schema的Avro数据读/写

首先我们需要创建一个以.avsc后缀结尾的schema文件,这里我们命名为StringPair.avsc

{
	"namespace":"schemas",
	"name":"StringPair",
	"type":"record",
	"doc":"This is a String Pair.",
	"fields":[
		{"name":"left", "type":"string"},
		{"name":"right", "type":"string"}
	]
}
namespace:可以理解为,文件创建后所在的文件夹名。
name:为文件名称,即类名称,(所以要按照java类的命名规范来命名)。
type:为Avro数据类型。
doc:为文档描述。
fields:为属性信息。

下面就是具体的demo代码:

package avro;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.avro.Schema.Parser;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;

public class AvroDemo {
	
	private ByteArrayOutputStream out = new ByteArrayOutputStream();
	
	private Parser parser = new Schema.Parser();
	
	public static void main(String[] args) throws IOException {
		AvroDemo demo = new AvroDemo();
		demo.writeAvroData();
		demo.readAvroData();
	}
	
	public void writeAvroData() throws IOException {
		Parser parser = new Schema.Parser();
		Schema schema = parser.parse(getClass().getResourceAsStream("/StringPair.avsc"));
		GenericRecord datum = new GenericData.Record(schema);
		datum.put("left", "L");
		datum.put("right", "R");
		DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
		Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
		writer.write(datum, encoder);
		encoder.flush();
		out.close();
	}
	
	public void readAvroData() throws IOException {
		Schema schema = parser.parse(getClass().getResourceAsStream("/StringPair.avsc"));
		ByteArrayInputStream input = new ByteArrayInputStream(out.toByteArray());
		GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema);
		Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
		GenericRecord result = reader.read(null, decoder);
		input.close();
		System.out.println(result.get("left"));
		System.out.println(result.get("right"));
	}
}
2-3-2.基于schema生成的Avro文件,进行的数据读/写

我们可以通过schema生成StringPair类,有几种方法可以实现,这里以Maven为例。
首先,要通过配置pom.xml文件,得到我们需要的jar(Maven会根据配置信息自动下载所需要的jar文件,具体信息如下)。

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>learn.hadoop.eco</groupId>
  <artifactId>HadoopEcoTest</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
  	<dependency>
  		<groupId>org.apache.avro</groupId>
  		<artifactId>avro</artifactId>
  		<version>1.9.0</version>
  	</dependency>
  </dependencies>
  <build>
  	<plugins>
  		<plugin>
  			<groupId>org.apache.avro</groupId>
  			<artifactId>avro-maven-plugin</artifactId>
  			<version>1.9.0</version>
  			<executions>
  				<execution>
  					<phase>generate-sources</phase>
  					<goals>
  						<goal>schema</goal>
  					</goals>
  				</execution>
  			</executions>
  			<configuration>
  			    <!-- 此处的${project.basedir}为项目路径,之后的路径为你自己的目录结构,具体看下面图片-->
  				<sourceDirectory>${project.basedir}/src/main/resources</sourceDirectory>
  				<outputDirectory>${project.basedir}/src/main/java</outputDirectory>
  			</configuration>
  		</plugin>
  		<plugin>
  			<groupId>org.apache.maven.plugins</groupId>
  			<artifactId>maven-compiler-plugin</artifactId>
  			<configuration>
  				<source>1.8</source>
  				<target>1.8</target>
  			</configuration>
  		</plugin>
  	</plugins>
  </build>
</project>

在这里插入图片描述
※注这里配置完成之后会报错,把光标移动到报错的行,按下Ctrl+F1(即可提示快速解决方案↓),任意选一个解决方案,这里我选的第一个解决方案。
在这里插入图片描述
选择解决方案后,pom.xml文件中会自动添加下列内容,文件不再报错,但是如果项目依然报错,解决方法:项目右键→Maven→Upadte Project ...
在这里插入图片描述
如下状态,点击ok即可。
在这里插入图片描述
接下来运行Maven Build生成类文件(Goals为Compile),点击Run,待运行结束后,文件被生成。
在这里插入图片描述
在这里插入图片描述
接下来我们将上面的Demo稍作改动,来实现用schema生成的类进行Avro数据的读/写。

package avro;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;
import org.apache.avro.specific.SpecificDatumReader;
import org.apache.avro.specific.SpecificDatumWriter;

import schemas.StringPair;

public class AvroDemo2 {
	
	private ByteArrayOutputStream out = new ByteArrayOutputStream();
	
	public static void main(String[] args) throws IOException {
		AvroDemo2 demo = new AvroDemo2();
		demo.writeAvroData();
		demo.readAvroData();
	}
	
	public void writeAvroData() throws IOException {
		StringPair datum = new StringPair();
		datum.setLeft("Left");
		datum.setRight("Right");
		SpecificDatumWriter<StringPair> writer = new SpecificDatumWriter<StringPair>(StringPair.class);
		Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
		writer.write(datum, encoder);
		encoder.flush();
		out.close();
	}
	
	public void readAvroData() throws IOException {
		ByteArrayInputStream input = new ByteArrayInputStream(out.toByteArray());
		SpecificDatumReader<StringPair> reader = new SpecificDatumReader<StringPair>(StringPair.class);
		Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
		StringPair result = reader.read(null, decoder);
		System.out.println(result.getLeft());
		System.out.println(result.getRight());
		input.close();
	}
}

3.Avro数据文件

Avro的数据文件主要用于存储Avro对象序列。这与Hadoop的SequenceFile非常相似,他们之间的最大区别在于,Avro数据文件主要是面向跨语言使用而设计的,因此,我们可以用一种语言写入文件,并用另外一种语言来读取文件。(我们这里以Python为例)

3-1.Python写入 & Java读取

首先,我们需要安装Python,可以自行yum或者直接下载解压版,然后要为Python安装avro执行如下命令即可:

easy_install avro

接下来,我们来写一个python脚本,来创建avro数据文件。

import string
import sys

from avro import schema
from avro import io
from avro import datafile

if __name__ == '__main__':
  if len(sys.argv) != 2:
    sys.exit('Usage: %s <data_file>' % sys.argv[0])
  avro_file = sys.argv[1]
  #注意,avro是以二进制存储文件的
  writer = open(avro_file, 'wb')
  datum_writer = io.DatumWriter()
  schema_object = schema.parse(' \
  { "type": "record", \
    "name": "StringPair", \
    "doc": "a pair of strings.", \
    "fields": [ \
      {"name": "left", "type": "string"}, \
      {"name": "right", "type": "string"} \
    ] \
  }')
  dfw = datafile.DataFileWriter(writer, datum_writer, schema_object)
  for line in sys.stdin.readlines():
    (left, right) = string.split(line.strip(), ',')
    dfw.append({'left':left, 'right':right});
  dfw.close()

上面的脚本执行后,可以在控制台输出left和right的值,用逗号分隔,退出时按Ctrl+D键。
在这里插入图片描述
接下来我们在java端写一个demo,来读出上面用python脚本生成的avro数据文件。

package avro;

import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileReader;
import org.apache.avro.specific.SpecificDatumReader;

import schemas.StringPair;

public class AvroDemo3 {
	
	public static void main(String[] args) throws IOException {
		AvroDemo3 demo = new AvroDemo3();
		demo.readAvroFile();
	}
	
	public void readAvroFile() throws IOException {
		File file = new File("/home/hadoop/demo/python/pairs.avro");
		SpecificDatumReader<StringPair> reader = new SpecificDatumReader<StringPair>(StringPair.class);
		DataFileReader<StringPair> file_reader = new DataFileReader<StringPair>(file, reader);
		StringPair result = null;
		while(file_reader.hasNext()) {
			result = file_reader.next(result);
			System.out.println("left : " + result.getLeft() + ", right : " + result.getRight());
		}
		file_reader.close();
	}
}

输出结果如下:
在这里插入图片描述

3-2.Java写入 & Python读取

这次呢,我们来写一个和上面过程相反的例子,这次我们通过java代码生成文件,再通过python来读取文件内容。

package avro;

import java.io.File;
import java.io.IOException;

import org.apache.avro.file.DataFileWriter;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.specific.SpecificDatumWriter;

import schemas.StringPair;

public class AvroDemo4 {
	
	public static void main(String[] args) throws IOException {
		AvroDemo4 demo = new AvroDemo4();
		demo.writeAvroFile();
	}
	
	public void writeAvroFile() throws IOException {
		File file = new File("/home/hadoop/demo/java/pairs_java.avro");
		DatumWriter<StringPair> writer = new SpecificDatumWriter<StringPair>(StringPair.class);
		DataFileWriter<StringPair> file_writer = new DataFileWriter<StringPair>(writer);
		StringPair datum = new StringPair();;
		file_writer.create(datum.getSchema(), file);
		for(int i =1; i < 10; i++) {
			datum.setLeft("left" + i);
			datum.setRight("right" + i);
			file_writer.append(datum);
		}
		file_writer.flush();
		file_writer.close();
	}
}

文件已经生成,接下来我们写一个python脚本来读取刚才用java程序生成的avro数据文件。

import sys

from avro import schema
from avro import io
from avro import datafile

from json import dumps

if __name__ == '__main__':
  if len(sys.argv) != 2:
    sys.exit('Usage: %s <data_file>' % sys.argv[0])
  avro_file = sys.argv[1]
  reader = open(avro_file, 'rb')
  schema_object = schema.parse(' \
  { "type": "record", \
    "name": "StringPair", \
    "doc": "a pair of strings.", \
    "fields": [ \
      {"name": "left", "type": "string"}, \
      {"name": "right", "type": "string"} \
    ] \
  }')
  datum_reader = io.DatumReader(schema_object)
  dwr = datafile.DataFileReader(reader, datum_reader)
  for line in dwr:
    print dumps(line)
  dwr.close()

在这里插入图片描述

4.Schema解析

Avro的Schema解析非常灵活,读和写可以用不同的schema。

4-1.reader比write所使用的schema的字段多。

reader比writer所使用的schema字段数多

在刚才的StringPair.avsc的基础上,我们再创建一个新的schema文件,这次多添加 一个字段。

{
	"namespace":"schemas",
	"name":"StringPairAdd",
	"type":"record",
	"doc":"This is another String Pair.",
	"fields":[
		{"name":"left", "type":"string"},
		{"name":"right", "type":"string"},
		{"name":"middle", "type":"string", "default":"This is middle"}
	]
}

在之前的AvroDemo类的基础上,我们稍作修改,来看一下效果。

package avro;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.avro.Schema.Parser;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;

public class AvroDemo5 {
	
	private ByteArrayOutputStream out = new ByteArrayOutputStream();
	
	private Parser parser = new Schema.Parser();
	
	public static void main(String[] args) throws IOException {
		AvroDemo5 demo = new AvroDemo5();
		demo.writeAvroData();
		demo.readAvroData();
	}
	
	public void writeAvroData() throws IOException {
		Parser parser = new Schema.Parser();
		Schema schema = parser.parse(getClass().getResourceAsStream("/StringPair.avsc"));
		GenericRecord datum = new GenericData.Record(schema);
		datum.put("left", "L");
		datum.put("right", "R");
		DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
		Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
		writer.write(datum, encoder);
		encoder.flush();
		out.close();
	}
	
	public void readAvroData() throws IOException {
		Schema schema_write = parser.parse(getClass().getResourceAsStream("/StringPair.avsc"));
		Schema schema_read = parser.parse(getClass().getResourceAsStream("/StringPairAdded.avsc"));
		ByteArrayInputStream input = new ByteArrayInputStream(out.toByteArray());
		GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema_write, schema_read);
		Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
		GenericRecord result = reader.read(null, decoder);
		System.out.println(result.get("left"));
		System.out.println(result.get("right"));
		System.out.println(result.get("middle"));
		input.close();
	}
}

得到输出结果如下:
在这里插入图片描述
※注:这里读取时用的schema比写入时新,所以读取时的schema(StringPairAdded.avsc)中新添加的字段一定要配置default属性,否则会报错。
在这里插入图片描述

4-2.reader比writer所使用的schema字段数少。

这种情况常被称作投影
我们再创建一个schema,这次在StringPari.avsc的基础上我们去掉一个字段。

{
	"namespace":"schemas",
	"name":"ProjectedStringPair",
	"type":"record",
	"doc":"This is a Project String Pair.",
	"fields":[
		{"name":"right", "type":"string"}
	]
}

接下来是Java代码。

package avro;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;

import org.apache.avro.Schema;
import org.apache.avro.Schema.Parser;
import org.apache.avro.generic.GenericData;
import org.apache.avro.generic.GenericDatumReader;
import org.apache.avro.generic.GenericDatumWriter;
import org.apache.avro.generic.GenericRecord;
import org.apache.avro.io.DatumWriter;
import org.apache.avro.io.Decoder;
import org.apache.avro.io.DecoderFactory;
import org.apache.avro.io.Encoder;
import org.apache.avro.io.EncoderFactory;

public class AvroDemo6 {
	
	private ByteArrayOutputStream out = new ByteArrayOutputStream();
	
	private Parser parser = new Schema.Parser();
	
	public static void main(String[] args) throws IOException {
		AvroDemo6 demo = new AvroDemo6();
		demo.writeAvroData();
		demo.readAvroData();
	}
	
	public void writeAvroData() throws IOException {
		Parser parser = new Schema.Parser();
		Schema schema = parser.parse(getClass().getResourceAsStream("/StringPair.avsc"));
		GenericRecord datum = new GenericData.Record(schema);
		datum.put("left", "L");
		datum.put("right", "R");
		DatumWriter<GenericRecord> writer = new GenericDatumWriter<GenericRecord>(schema);
		Encoder encoder = EncoderFactory.get().binaryEncoder(out, null);
		writer.write(datum, encoder);
		encoder.flush();
		out.close();
	}
	
	public void readAvroData() throws IOException {
		Schema schema_write = parser.parse(getClass().getResourceAsStream("/StringPair.avsc"));
		Schema schema_read = parser.parse(getClass().getResourceAsStream("/ProjectedStringPair.avsc"));
		ByteArrayInputStream input = new ByteArrayInputStream(out.toByteArray());
		GenericDatumReader<GenericRecord> reader = new GenericDatumReader<GenericRecord>(schema_write, schema_read);
		Decoder decoder = DecoderFactory.get().binaryDecoder(input, null);
		GenericRecord result = reader.read(null, decoder);
		System.out.println(result.get("left"));
		System.out.println(result.get("right"));
		System.out.println(result.getSchema());
		input.close();
	}
}

执行结果如下:
在这里插入图片描述
可以看到,left值为null,schema的信息中也没有left字段。

5.其他

除了上述的内容之外,Avro还支持别名,排序,MapReduce等等功能,这里值做简单介绍,不再一一例举。

5-1.别名

Avro可以允许读操作时,与写入操作时不同的字段名称来读取同一列的内容。
※注:为读操作指定别名后,就无法在读取时使用写操作时的列名称。(如下,读取时只能用first和second字段,而不再用left和right)

{
	"type": "record",
	"name": "StringPair",
	"doc": "A pair of strings with aliased field names.",
	"fields": [
		{"name": "first", "type": "string", "aliases": ["left"]},
		{"name": "second", "type": "string", "aliases": ["right"]}
	]
}

5-2.排序

Avro的排序可通过在schema中为字段指定order属性来控制,它有三个值:
①ascending(默认值):升序
②descending:降序
③ignore:排序的时候忽略此字段

{
	"type": "record",
	"name": "StringPair",
	"doc": "A pair of strings, sorted by right field descending.",
	"fields": [
		{"name": "left", "type": "string", "order": "ignore"},
		{"name": "right", "type": "string", "order": "descending"}
	]
}
DEPRECATED: Use of this script to execute hdfs command is deprecated. Instead use the hdfs command for it. 2025-06-18 16:50:39,734 INFO datanode.DataNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting DataNode STARTUP_MSG: host = LAPTOP-FK5QKFGQ/192.168.10.1 STARTUP_MSG: args = [] STARTUP_MSG: version = 3.2.2 STARTUP_MSG: classpath = D:\pyspark\Hadoop\hadoop-3.2.2\etc\hadoop;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\accessors-smart-1.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\animal-sniffer-annotations-1.17.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\asm-5.0.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\audience-annotations-0.5.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\avro-1.7.7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\checker-qual-2.5.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-beanutils-1.9.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-cli-1.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-codec-1.11.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-collections-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-compress-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-configuration2-2.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-io-2.5.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-lang3-3.7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-logging-1.1.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-math3-3.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-net-3.6.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\commons-text-1.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\curator-client-2.13.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\curator-framework-2.13.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\curator-recipes-2.13.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\dnsjava-2.1.7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\error_prone_annotations-2.2.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\failureaccess-1.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\gson-2.2.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\guava-27.0-jre.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\hadoop-annotations-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\hadoop-auth-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\htrace-core4-4.1.0-incubating.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\httpclient-4.5.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\httpcore-4.4.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\j2objc-annotations-1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-annotations-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-core-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-core-asl-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-databind-2.9.10.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-jaxrs-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-mapper-asl-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jackson-xc-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\javax.activation-api-1.2.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\javax.servlet-api-3.1.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jaxb-api-2.2.11.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jaxb-impl-2.2.3-1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jcip-annotations-1.0-1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jersey-core-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jersey-json-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jersey-server-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jersey-servlet-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jettison-1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-http-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-io-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-security-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-server-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-servlet-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-util-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-webapp-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jetty-xml-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jsch-0.1.55.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\json-smart-2.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jsp-api-2.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jsr305-3.0.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jsr311-api-1.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\jul-to-slf4j-1.7.25.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-admin-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-client-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-common-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-core-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-crypto-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-identity-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-server-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-simplekdc-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerb-util-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerby-asn1-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerby-config-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerby-pkix-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerby-util-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\kerby-xdr-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\log4j-1.2.17.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\metrics-core-3.2.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\netty-3.10.6.Final.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\nimbus-jose-jwt-7.9.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\paranamer-2.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\protobuf-java-2.5.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\re2j-1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\slf4j-api-1.7.25.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\slf4j-log4j12-1.7.25.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\snappy-java-1.0.5.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\stax2-api-3.1.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\token-provider-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\woodstox-core-5.0.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\lib\zookeeper-3.4.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\hadoop-common-3.2.2-tests.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\hadoop-common-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\hadoop-kms-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\common\hadoop-nfs-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\accessors-smart-1.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\animal-sniffer-annotations-1.17.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\asm-5.0.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\audience-annotations-0.5.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\avro-1.7.7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\checker-qual-2.5.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-beanutils-1.9.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-cli-1.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-codec-1.11.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-collections-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-compress-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-configuration2-2.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-daemon-1.0.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-io-2.5.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-lang3-3.7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-logging-1.1.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-math3-3.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-net-3.6.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\commons-text-1.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\curator-client-2.13.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\curator-framework-2.13.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\curator-recipes-2.13.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\dnsjava-2.1.7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\error_prone_annotations-2.2.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\failureaccess-1.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\gson-2.2.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\guava-27.0-jre.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\hadoop-annotations-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\hadoop-auth-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\htrace-core4-4.1.0-incubating.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\httpclient-4.5.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\httpcore-4.4.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\j2objc-annotations-1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-annotations-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-core-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-core-asl-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-databind-2.9.10.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-jaxrs-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-mapper-asl-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jackson-xc-1.9.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\javax.activation-api-1.2.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\javax.servlet-api-3.1.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jaxb-api-2.2.11.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jaxb-impl-2.2.3-1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jcip-annotations-1.0-1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jersey-core-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jersey-json-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jersey-server-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jersey-servlet-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jettison-1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-http-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-io-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-security-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-server-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-servlet-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-util-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-util-ajax-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-webapp-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jetty-xml-9.4.20.v20190813.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jsch-0.1.55.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\json-simple-1.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\json-smart-2.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jsr305-3.0.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\jsr311-api-1.1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-admin-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-client-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-common-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-core-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-crypto-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-identity-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-server-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-simplekdc-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerb-util-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerby-asn1-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerby-config-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerby-pkix-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerby-util-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\kerby-xdr-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\leveldbjni-all-1.8.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\listenablefuture-9999.0-empty-to-avoid-conflict-with-guava.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\log4j-1.2.17.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\netty-3.10.6.Final.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\netty-all-4.1.48.Final.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\nimbus-jose-jwt-7.9.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\okhttp-2.7.5.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\okio-1.6.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\paranamer-2.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\protobuf-java-2.5.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\re2j-1.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\snappy-java-1.0.5.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\stax2-api-3.1.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\token-provider-1.0.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\woodstox-core-5.0.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\lib\zookeeper-3.4.13.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-3.2.2-tests.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-client-3.2.2-tests.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-client-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-httpfs-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-native-client-3.2.2-tests.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-native-client-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-nfs-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-rbf-3.2.2-tests.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\hdfs\hadoop-hdfs-rbf-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\aopalliance-1.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\bcpkix-jdk15on-1.60.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\bcprov-jdk15on-1.60.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\ehcache-3.3.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\fst-2.50.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\geronimo-jcache_1.0_spec-1.0-alpha-1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\guice-4.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\guice-servlet-4.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\HikariCP-java7-2.4.12.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\jackson-jaxrs-base-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\jackson-jaxrs-json-provider-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\jackson-module-jaxb-annotations-2.9.10.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\java-util-1.9.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\javax.inject-1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\jersey-client-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\jersey-guice-1.19.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\json-io-2.5.1.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\metrics-core-3.2.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\mssql-jdbc-6.2.1.jre7.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\objenesis-1.0.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\snakeyaml-1.16.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\lib\swagger-annotations-1.5.4.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-api-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-applications-distributedshell-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-applications-unmanaged-am-launcher-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-client-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-common-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-registry-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-applicationhistoryservice-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-common-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-nodemanager-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-resourcemanager-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-router-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-sharedcachemanager-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-tests-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-timeline-pluginstorage-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-server-web-proxy-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-services-api-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-services-core-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\yarn\hadoop-yarn-submarine-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\lib\hamcrest-core-1.3.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\lib\junit-4.11.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-app-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-common-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-core-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-hs-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-hs-plugins-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-jobclient-3.2.2-tests.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-jobclient-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-nativetask-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-shuffle-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-client-uploader-3.2.2.jar;D:\pyspark\Hadoop\hadoop-3.2.2\share\hadoop\mapreduce\hadoop-mapreduce-examples-3.2.2.jar STARTUP_MSG: build = Unknown -r 7a3bc90b05f257c8ace2f76d74264906f0f7a932; compiled by 'hexiaoqiao' on 2021-01-03T09:26Z STARTUP_MSG: java = 1.8.0_281 ************************************************************/ 2025-06-18 16:50:45,335 INFO checker.ThrottledAsyncChecker: Scheduling a check for [DISK]file:/D:/hadoop-3.2.2/data/datanode 2025-06-18 16:50:45,420 INFO impl.MetricsConfig: Loaded properties from hadoop-metrics2.properties 2025-06-18 16:50:45,483 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot period at 10 second(s). 2025-06-18 16:50:45,484 INFO impl.MetricsSystemImpl: DataNode metrics system started 2025-06-18 16:50:46,677 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling 2025-06-18 16:50:46,689 INFO datanode.BlockScanner: Initialized block scanner with targetBytesPerSec 1048576 2025-06-18 16:50:46,692 INFO datanode.DataNode: Configured hostname is LAPTOP-FK5QKFGQ 2025-06-18 16:50:46,693 INFO common.Util: dfs.datanode.fileio.profiling.sampling.percentage set to 0. Disabling file IO profiling 2025-06-18 16:50:46,695 INFO datanode.DataNode: Starting DataNode with maxLockedMemory = 0 2025-06-18 16:50:46,709 INFO datanode.DataNode: Opened streaming server at /0.0.0.0:9866 2025-06-18 16:50:46,710 INFO datanode.DataNode: Balancing bandwidth is 10485760 bytes/s 2025-06-18 16:50:46,710 INFO datanode.DataNode: Number threads for balancing is 50 2025-06-18 16:50:46,741 INFO util.log: Logging initialized @7589ms to org.eclipse.jetty.util.log.Slf4jLog 2025-06-18 16:50:51,787 INFO server.AuthenticationFilter: Unable to initialize FileSignerSecretProvider, falling back to use random secrets. 2025-06-18 16:50:51,821 INFO http.HttpRequestLog: Http request log for http.requests.datanode is not defined 2025-06-18 16:50:51,828 INFO http.HttpServer2: Added global filter 'safety' (class=org.apache.hadoop.http.HttpServer2$QuotingInputFilter) 2025-06-18 16:50:51,829 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context datanode 2025-06-18 16:50:51,829 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2025-06-18 16:50:51,830 INFO http.HttpServer2: Added filter static_user_filter (class=org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2025-06-18 16:50:51,848 INFO http.HttpServer2: Jetty bound to port 38751 2025-06-18 16:50:51,849 INFO server.Server: jetty-9.4.20.v20190813; built: 2019-08-13T21:28:18.144Z; git: 84700530e645e812b336747464d6fbbf370c9a20; jvm 1.8.0_281-b09 2025-06-18 16:50:51,865 INFO server.session: DefaultSessionIdManager workerName=node0 2025-06-18 16:50:51,865 INFO server.session: No SessionScavenger set, using defaults 2025-06-18 16:50:51,867 INFO server.session: node0 Scavenging every 660000ms 2025-06-18 16:50:51,874 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@2421cc4{logs,/logs,file:///D:/pyspark/Hadoop/hadoop-3.2.2/logs/,AVAILABLE} 2025-06-18 16:50:51,874 INFO handler.ContextHandler: Started o.e.j.s.ServletContextHandler@21ba0741{static,/static,file:///D:/pyspark/Hadoop/hadoop-3.2.2/share/hadoop/hdfs/webapps/static/,AVAILABLE} 2025-06-18 16:50:51,926 INFO util.TypeUtil: JVM Runtime does not support Modules 2025-06-18 16:50:51,932 INFO handler.ContextHandler: Started o.e.j.w.WebAppContext@43f82e78{datanode,/,file:///D:/pyspark/Hadoop/hadoop-3.2.2/share/hadoop/hdfs/webapps/datanode/,AVAILABLE}{file:/D:/pyspark/Hadoop/hadoop-3.2.2/share/hadoop/hdfs/webapps/datanode} 2025-06-18 16:50:51,939 INFO server.AbstractConnector: Started ServerConnector@1e097d59{HTTP/1.1,[http/1.1]}{localhost:38751} 2025-06-18 16:50:51,940 INFO server.Server: Started @12789ms 2025-06-18 16:50:52,540 INFO web.DatanodeHttpServer: Listening HTTP traffic on /0.0.0.0:9864 2025-06-18 16:50:52,545 INFO util.JvmPauseMonitor: Starting JVM pause monitor 2025-06-18 16:50:52,545 INFO datanode.DataNode: dnUserName = aaa 2025-06-18 16:50:52,546 INFO datanode.DataNode: supergroup = supergroup 2025-06-18 16:50:52,573 INFO ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue, queueCapacity: 1000, scheduler: class org.apache.hadoop.ipc.DefaultRpcScheduler, ipcBackoff: false. 2025-06-18 16:50:52,583 INFO ipc.Server: Starting Socket Reader #1 for port 9867 2025-06-18 16:50:52,720 INFO datanode.DataNode: Opened IPC server at /0.0.0.0:9867 2025-06-18 16:50:52,729 INFO datanode.DataNode: Refresh request received for nameservices: null 2025-06-18 16:50:52,735 INFO datanode.DataNode: Starting BPOfferServices for nameservices: <default> 2025-06-18 16:50:52,740 INFO datanode.DataNode: Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000 starting to offer service 2025-06-18 16:50:52,745 INFO ipc.Server: IPC Server Responder: starting 2025-06-18 16:50:52,745 INFO ipc.Server: IPC Server listener on 9867: starting 2025-06-18 16:50:52,954 INFO datanode.DataNode: Acknowledging ACTIVE Namenode during handshakeBlock pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000 2025-06-18 16:50:52,956 INFO common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1) 2025-06-18 16:50:52,965 INFO common.Storage: Lock on D:\hadoop-3.2.2\data\datanode\in_use.lock acquired by nodename 16808@LAPTOP-FK5QKFGQ 2025-06-18 16:50:52,970 WARN common.Storage: Failed to add storage directory [DISK]file:/D:/hadoop-3.2.2/data/datanode java.io.IOException: Incompatible clusterIDs in D:\hadoop-3.2.2\data\datanode: namenode clusterID = CID-0243def2-304c-4ffd-871c-57b2cdf0182f; datanode clusterID = CID-a6ff55fc-9daf-4605-8a53-edaae5a9f8de at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:744) at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:294) at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:407) at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:387) at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:559) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1748) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1684) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:392) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:748) 2025-06-18 16:50:52,973 ERROR datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid cd899db0-fd95-4996-8250-261d1d36dbda) service to localhost/127.0.0.1:9000. Exiting. java.io.IOException: All specified directories have failed to load. at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:560) at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1748) at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1684) at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:392) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:282) at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:829) at java.lang.Thread.run(Thread.java:748) 2025-06-18 16:50:52,973 WARN datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid cd899db0-fd95-4996-8250-261d1d36dbda) service to localhost/127.0.0.1:9000 2025-06-18 16:50:52,974 INFO datanode.DataNode: Removed Block pool <registering> (Datanode Uuid cd899db0-fd95-4996-8250-261d1d36dbda) 2025-06-18 16:50:54,974 WARN datanode.DataNode: Exiting Datanode 2025-06-18 16:50:54,976 INFO datanode.DataNode: SHUTDOWN_MSG: /************************************************************ SHUTDOWN_MSG: Shutting down DataNode at LAPTOP-FK5QKFGQ/192.168.10.1 ************************************************************/
最新发布
06-19
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值