Hadoop的Text类型是将字符串用UTF-8编码转换成bytes位数组。
/**
* Converts the provided String to bytes using the* UTF-8 encoding. If <code>replace</code> is true, then
* malformed input is replaced with the
* substitution character, which is U+FFFD. Otherwise the
* method throws a MalformedInputException.
* @return ByteBuffer: bytes stores at ByteBuffer.array()
* and length is ByteBuffer.limit()
*/
public static ByteBuffer encode(String string, boolean replace)
throws CharacterCodingException {
CharsetEncoder encoder = ENCODER_FACTORY.get();
if (replace) {
encoder.onMalformedInput(CodingErrorAction.REPLACE);
encoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
}
ByteBuffer bytes =
encoder.encode(CharBuffer.wrap(string.toCharArray()));
if (replace) {
encoder.onMalformedInput(CodingErrorAction.REPORT);
encoder.onUnmappableCharacter(CodingErrorAction.REPORT);
}
return bytes;
}
本文介绍Hadoop中Text类型的实现原理,重点讲解如何使用UTF-8编码将字符串转换为字节数组。文章详细解释了CharsetEncoder类的应用,并提供了具体的编码流程和错误处理方法。
1878

被折叠的 条评论
为什么被折叠?



