new String(str.getBytes("ISO8859-1"),"GBK")的实质

最新推荐文章于 2022-09-19 18:04:08 发布

原创最新推荐文章于 2022-09-19 18:04:08 发布 · 402 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#java

encoding 专栏收录该内容

1 篇文章

订阅专栏

本文探讨了浏览器发送GBK编码的数据至中间件时被误解读为ISO8859-1的问题，并提供了Java代码示例来演示如何正确地进行编码转换以还原原始的中文字符串。


import java.io.UnsupportedEncodingException;


public class main {
	public static void main(String[] args) throws UnsupportedEncodingException{
		byte [] b= new byte[]{(byte) 0xcc,(byte) 0xe1,(byte) 0xbd,(byte) 0xbb};
		String s = new String(b,"ISO8859-1");
		System.out.println(s);
		print(s.getBytes("ISO8859-1"));
		print(s.getBytes("GBK"));
		print(s.getBytes("UTF-16"));
		System.out.println(new String(s.getBytes("ISO8859-1"),"gbk"));

		String ss = "中文";
		print(ss.getBytes("UTF-16"));
		print(ss.getBytes("ISO8859-1"));
	}

	static void print(byte [] b){
		for(byte _b : b){
			String s = Integer.toHexString(_b&0xff);
			if(s.length()==1){
				s = "0"+s;
			}
			System.out.print(s + " ");
		}
		System.out.println();
	}
}

浏览器发送GBK字节到中间件，中间把这些字节都当作ISO8859-1字符处理，直接new String()打印出来的肯定是乱码，因为(byte) 0xcc,(byte) 0xe1,(byte) 0xbd,(byte) 0xbb对应的ISO8859-1字符是找不到的！

我们需要从字符串中重新把GBK字节拿出来,构造出原来的中文字符串。原来如代码所示。