为了学习unicode的utf-8和utf-16编码,写了如下程序进行学习。
import java.nio.charset.Charset;
public class MyStudy {
public static String field = "%-20s";
public static void main(String[] args){
System.out.format(field, "utf-16 length");
System.out.format(field, "utf-8 length");
System.out.format(field, "utf-16");
System.out.format(field, "utf-8");
System.out.format(field, "text");
System.out.println();
String[] arr = {"瓴", "龍", "瓴龍", "一", "一二", "一二三", "a", "ab", "abc"};
for (String str: arr){
System.out.format(field, str.getBytes(Charset.forName("UTF-16")).length);
System.out.format(field, str.getBytes(Charset.forName("UTF-8")).length);
System.out.format(field, toHex(str.getBytes(Charset.forName("UTF-16"))));
System.out.format(field, toHex(str.getBytes(Charset.forName("UTF-8"))));
System.out.format(field, str);
System.out.println();
}
}
public static String toHex(byte[] b) {
StringBuilder builder = new StringBuilder();
for (int i = 0; i < b.length; i++) {
builder.append(String.format("%02x", b[i]));
}
return builder.toString();
}
}
该程序的输出结果是:
做了以下总结:
1,utf-16以两个字节为一个单元;
2,utf-8以一个字节为一个单元;
3,utf-16的字节前边有"feff"的表示。"feff"表示Big-Endian,和Little- Endian(fffe)相对应;
附:https://zh.wikipedia.org/wiki/UTF-16