public static void main(String args[]) {
try {
byte[] b = "汉字a".getBytes("Unicode");
System.out.println(b.length);
b = "汉字a".getBytes("GBK");
System.out.println(b.length);
b = "汉字a".getBytes("UTF-8");
System.out.println(b.length);
} catch (UnsupportedEncodingException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
Unicode 所有字符2个字节编码+2个字节头
UTF-8 汉字*3 字符*1
gbk/gb2312 汉字*2 字符*1
所以是8
5
7
-----
static void show(byte[] b) {
for (int i = 0; i < b.length; i++)
System.out.print(Integer.toBinaryString(b[i]&0xff)+“ ”);
}
以汉字汉为例
unicode
11111110 11111111 1101100 1001001 ---4 unicode
10111010 10111010 ---2 gbk
11100110 10110001 10001001 ---3utf-8
utf-8 1110xxx 3个字节就是3个1
10xxxxxx
10xxxxxx