Hadoop的Text类型实现

最新推荐文章于 2022-09-12 17:50:12 发布

原创最新推荐文章于 2022-09-12 17:50:12 发布 · 2k 阅读

1 ·

CC 4.0 BY-SA版权

Hadoop相关专栏收录该内容

70 篇文章

订阅专栏

本文介绍Hadoop中Text类型的实现原理，重点讲解如何使用UTF-8编码将字符串转换为字节数组。文章详细解释了CharsetEncoder类的应用，并提供了具体的编码流程和错误处理方法。

Hadoop的Text类型是将字符串用UTF-8编码转换成bytes位数组。

/**

* Converts the provided String to bytes using the
* UTF-8 encoding. If <code>replace</code> is true, then
* malformed input is replaced with the
* substitution character, which is U+FFFD. Otherwise the
* method throws a MalformedInputException.
* @return ByteBuffer: bytes stores at ByteBuffer.array()
* and length is ByteBuffer.limit()
*/
public static ByteBuffer encode(String string, boolean replace)
throws CharacterCodingException {
CharsetEncoder encoder = ENCODER_FACTORY.get();
if (replace) {
encoder.onMalformedInput(CodingErrorAction.REPLACE);
encoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
}
ByteBuffer bytes =
encoder.encode(CharBuffer.wrap(string.toCharArray()));
if (replace) {
encoder.onMalformedInput(CodingErrorAction.REPORT);
encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
}
return bytes;
}