[笔记]avro 介绍及官网例子

[b]Apache Avro[/b]是一个独立于编程语言的数据序列化系统。旨在解决Hadoop中Writable类型的不足:缺乏语言的可移植性。其强调数据的自我描述,依赖于它的schema。即支持动态加载schema,动态映射;也支持代码生成的描述性映射。
[i]官网的介绍:[/i]
[quote]Apache Avro™ is a data serialization system. Avro provides:
[*]Rich data structures.
[*]A compact, fast, binary data format.
[*]A container file, to store persistent data.
[*]Remote procedure call (RPC).
[*]Simple integration with dynamic languages. Code generation is not required to read or write data files nor to use or implement RPC protocols. Code generation as an optional optimization, only worth implementing for statically typed languages.
[/quote]
[b]官网例子:[/b]
[i]依赖[/i]
<dependency>
<groupId>org.apache.avro</groupId>
<artifactId>avro</artifactId>
<version>${avro.version}</version>
</dependency>

[i]插件[/i]

<plugin>
<groupId>org.apache.avro</groupId>
<artifactId>avro-maven-plugin</artifactId>
<version>${avro.version}</version>
<executions>
<execution>
<phase>generate-sources</phase>
<goals>
<goal>schema</goal>
</goals>
<configuration>
<sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
<outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>

[i]schemas:(src/main/avro/user.avsc)[/i]

{"namespace": "com.sanss.hadoop.demos.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}

[*][b]Spedic Java Mapping[/b]
[i]生成java文件:[/i]

mvn clean compile

[i]创建对象[/i]

User user1 = new User();
user1.setName("Alyssa");
user1.setFavoriteNumber(256);
// Leave favorite color null

// Alternate constructor
User user2 = new User("Ben", 7, "red");

// Construct via builder
User user3 = User.newBuilder().setName("Charlie")
.setFavoriteColor("blue").setFavoriteNumber(null).build();

[i]序列化[/i]

// Serialize to disk
File file = new File("users.avro");
DatumWriter<User> userDatumWriter = new SpecificDatumWriter<User>(
User.class);
try (DataFileWriter<User> dataFileWriter = new DataFileWriter<User>(
userDatumWriter);) {
dataFileWriter.create(User.SCHEMA$, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.append(user3);
dataFileWriter.close();
}

[i]反序列化[/i]

// Deserialize Users from disk
DatumReader<User> userDatumReader = new SpecificDatumReader<User>(
User.class);
try (DataFileReader<User> dataFileReader = new DataFileReader<User>(
file, userDatumReader);) {
User user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}
}

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}
{"name": "Charlie", "favorite_number": null, "favorite_color": "blue"}

[*][b]Generic Java Mapping[/b]
[i]创建对象[/i]

Schema schema = new Schema.Parser().parse(new File(
GenericJavaMappingDemo.class.getClassLoader()
.getResource("user.avsc").toURI()));
GenericRecord user1 = new GenericData.Record(schema);
user1.put("name", "Alyssa");
user1.put("favorite_number", 256);
// Leave favorite color null

GenericRecord user2 = new GenericData.Record(schema);
user2.put("name", "Ben");
user2.put("favorite_number", 7);
user2.put("favorite_color", "red");

[i]序列化[/i]

// Serialize users to disk
File file = new File("users.avro");
DatumWriter<GenericRecord> datumWriter = new GenericDatumWriter<GenericRecord>(
schema);
try (DataFileWriter<GenericRecord> dataFileWriter = new DataFileWriter<GenericRecord>(
datumWriter);) {
dataFileWriter.create(schema, file);
dataFileWriter.append(user1);
dataFileWriter.append(user2);
dataFileWriter.close();
}

[i]反序列化[/i]

// Deserialize users from disk
DatumReader<GenericRecord> datumReader = new GenericDatumReader<GenericRecord>(
schema);
try (DataFileReader<GenericRecord> dataFileReader = new DataFileReader<GenericRecord>(
file, datumReader);) {
GenericRecord user = null;
while (dataFileReader.hasNext()) {
// Reuse user object by passing it to next(). This saves us from
// allocating and garbage collecting many objects for files with
// many items.
user = dataFileReader.next(user);
System.out.println(user);
}
}

{"name": "Alyssa", "favorite_number": 256, "favorite_color": null}
{"name": "Ben", "favorite_number": 7, "favorite_color": "red"}

[*][b]Schemas介绍:[/b]
Avro依赖于schemas,schemas使用JSON定义,支持基本的类型包括[b]null, boolean, int, long, float, double, bytes , string[/b];支持的复合类型包括[b]record, enum, array, map, union, fixed[/b]。avro可以通过schemas自动生成代码来表示avro的数据类型(Spedific Java mapping);也可以动态映射(Generic Java mapping)。(Reflect Java mapping不推荐)。
[table]
|类型名称|描述|
|null|空值|
|boolean|二进制值|
|int|32位带符号整数|
|long|64位带符号整数|
|float|单精度32位浮点数IEEE754|
|double|双精度64位浮点数IEEE754|
|bytes|8位无符号字节序列|
|string|Unicode字符序列|
|record|任意类型的一个命名字段集合,JSON对象表示|
|enum|一个命名的值集合|
|array|未排序的对象集合,对象的模式必须相同|
|map|未排序的对象键/值对。键必须是字符串,值可以是任何类型,但必须模式相同|
|union|模式的并集,可以用JSON数组表示,每个元素为一个模式|
|fixed|一组固定数量的8位无符号字节|
[/table]
Apache Avro是一个数据序列化系统,旨在支持快速、节省空间的数据交换和远程过程调用。它使用JSON格式定义数据结构,并支持动态类型,使其易于在不同编程语言之间进行交互。 使用Avro的步骤如下: 1. 定义数据结构:使用Avro的JSON格式定义数据结构,包括字段名、类型和默认值等信息。 例如,定义一个Person对象: ``` { "namespace": "example.avro", "type": "record", "name": "Person", "fields": [ {"name": "name", "type": "string"}, {"name": "age", "type": "int"} ] } ``` 2. 生成代码:使用Avro工具生成指定语言的代码,包括数据结构类和序列化/反序列化类等。 例如,使用Avro工具生成Java代码: ``` $ java -jar avro-tools-1.9.2.jar compile schema person.avsc . ``` 3. 序列化数据:使用生成的代码将数据序列化为字节数组。 例如,使用Java代码创建Person对象并序列化: ``` Person person = new Person("Alice", 30); ByteArrayOutputStream out = new ByteArrayOutputStream(); DatumWriter<Person> writer = new SpecificDatumWriter<Person>(Person.class); Encoder encoder = EncoderFactory.get().binaryEncoder(out, null); writer.write(person, encoder); encoder.flush(); byte[] bytes = out.toByteArray(); ``` 4. 反序列化数据:使用生成的代码将字节数组反序列化为数据对象。 例如,使用Java代码反序列化字节数组: ``` ByteArrayInputStream in = new ByteArrayInputStream(bytes); DatumReader<Person> reader = new SpecificDatumReader<Person>(Person.class); Decoder decoder = DecoderFactory.get().binaryDecoder(in, null); Person person2 = reader.read(null, decoder); ``` 这样,就完成了数据的序列化和反序列化。 以下是一个完整的Java代码示例: ``` import org.apache.avro.Schema; import org.apache.avro.Schema.Parser; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericRecord; import org.apache.avro.io.DatumReader; import org.apache.avro.io.DatumWriter; import org.apache.avro.io.Decoder; import org.apache.avro.io.DecoderFactory; import org.apache.avro.io.Encoder; import org.apache.avro.io.EncoderFactory; import org.apache.avro.specific.SpecificDatumReader; import org.apache.avro.specific.SpecificDatumWriter; import org.apache.avro.specific.SpecificRecord; import java.io.ByteArrayInputStream; import java.io.ByteArrayOutputStream; import java.io.IOException; public class AvroExample { public static void main(String[] args) throws IOException { // Define schema String schemaJson = "{\n" + " \"namespace\": \"example.avro\",\n" + " \"type\": \"record\",\n" + " \"name\": \"Person\",\n" + " \"fields\": [\n" + " {\"name\": \"name\", \"type\": \"string\"},\n" + " {\"name\": \"age\", \"type\": \"int\"}\n" + " ]\n" + "}"; Schema schema = new Parser().parse(schemaJson); // Serialize data Person person = new Person("Alice", 30); ByteArrayOutputStream out = new ByteArrayOutputStream(); DatumWriter<Person> writer = new SpecificDatumWriter<Person>(Person.class); Encoder encoder = EncoderFactory.get().binaryEncoder(out, null); writer.write(person, encoder); encoder.flush(); byte[] bytes = out.toByteArray(); // Deserialize data ByteArrayInputStream in = new ByteArrayInputStream(bytes); DatumReader<Person> reader = new SpecificDatumReader<Person>(Person.class); Decoder decoder = DecoderFactory.get().binaryDecoder(in, null); Person person2 = reader.read(null, decoder); System.out.println(person2.getName()); // Alice System.out.println(person2.getAge()); // 30 } public static class Person implements SpecificRecord { private String name; private int age; public Person() { } public Person(String name, int age) { this.name = name; this.age = age; } public void setName(String name) { this.name = name; } public String getName() { return name; } public void setAge(int age) { this.age = age; } public int getAge() { return age; } @Override public void put(int i, Object o) { } @Override public Object get(int i) { if (i == 0) { return name; } else if (i == 1) { return age; } return null; } @Override public Schema getSchema() { return null; } } } ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值