记录一下字节编码的知识:
Eclipse的默认编码:
<span style="white-space:pre"> </span>String s1 = "寂寞love";
//编译器默认的编码GBK,中文占2字节
byte []b1 = s1.getBytes();
for(byte b:b1){
System.out.print(Integer.toHexString(b & 0xff)+" ");//与运算去掉前面6个f
}
输出:
转化为utf-8后:
//将编码转换成utf-8后中文占用3个字节,字母一个
byte[]b2 = s1.getBytes("utf-8");
for(byte b:b2){
System.out.print(Integer.toHexString(b & 0xff)+" ");
}
输出:
转化为utf-16be:
//java用的编码是utf-16be,中文和字母都占用2个字节
byte[]b3 = s1.getBytes("utf-16be");
for(byte b:b3){
System.out.print(Integer.toHexString(b & 0xff)+" ");
}
结果为:
当把字节序列转化为字符创时要用相同的编码,否则会出现乱码:
<span style="white-space:pre"> </span>String str = new String(b2);
System.out.println(str);
输出为:
所以要这样转换:
<span style="white-space:pre"> </span>String str2 = new String(b2,"utf-8");
System.out.println(str2);
结果:
下面是完整代码:
public class IOByte {
public static void main(String []args) throws UnsupportedEncodingException{
String s1 = "寂寞love";
//编译器默认的编码GBK,中文占2字节
byte []b1 = s1.getBytes();
for(byte b:b1){
System.out.print(Integer.toHexString(b & 0xff)+" ");//与运算去掉前面6个f
}
System.out.println();
//将编码转换成utf-8后中文占用3个字节,字母一个
byte[]b2 = s1.getBytes("utf-8");
for(byte b:b2){
System.out.print(Integer.toHexString(b & 0xff)+" ");
}
System.out.println();
//java用的编码是utf-16be,中文和字母都占用2个字节
byte[]b3 = s1.getBytes("utf-16be");
for(byte b:b3){
System.out.print(Integer.toHexString(b & 0xff)+" ");
}
System.out.println();
//当把字节序列转化为字符创时要用相同的编码,否则会出现乱码
String str = new String(b2);
System.out.println(str);
//正确转换
String str2 = new String(b2,"utf-8");
System.out.println(str2);
}
}
完整输出: