JAVA乱码之Byte分析

最新推荐文章于 2023-04-20 21:37:49 发布

原创最新推荐文章于 2023-04-20 21:37:49 发布 · 273 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#Java #Web

JAVA开发专栏收录该内容

26 篇文章

订阅专栏

本文介绍了一段Java代码示例，演示了如何使用不同的字符集（如ISO-8859-1和GBK）将字符串转换为字节数组，并展示了不同编码方式下中文字符的表现形式。通过对比两种编码方式下的结果，解释了乱码产生的原因。

在做WEB开发的时候经常会遇到乱码问题，在解析字节数组的时候指定其编码方式即可。

Testing...

public class CodeTest {

	public static void main(String[] args) {
		execute();
	}

	private static void execute() {
		String s = "hello，你好！";
		byte[] bytesISO8859 = null;
		byte[] bytesGBK = null;
		try {
			bytesISO8859 = s.getBytes("iso-8859-1");
			bytesGBK = s.getBytes("GBK");
		} catch (java.io.UnsupportedEncodingException e) {
			e.printStackTrace();
		}
		System.out.println("--------------\n 8859 bytes:");
		System.out.println("bytes is:     " + arrayToString(bytesISO8859));
		System.out.println("hex format is:" + encodeHex(bytesISO8859));
		System.out.println();
		System.out.println("--------------\n GBK bytes:");
		System.out.println("bytes is:" + arrayToString(bytesGBK));
		System.out.println("hex format is:" + encodeHex(bytesGBK));
	}

	public static final String encodeHex(byte[] bytes) {
		StringBuffer buff = new StringBuffer(bytes.length * 2);
		String b;
		for (int i = 0; i < bytes.length; i++) {
			b = Integer.toHexString(bytes[i]);
			// byte是两个字节的,而上面的Integer.toHexString会把字节扩展为4个字节
			buff.append(b.length() > 2 ? b.substring(6, 8) : b);
			buff.append(" ");
		}
		return buff.toString();
	}

	public static final String arrayToString(byte[] bytes) {
		StringBuffer buff = new StringBuffer();
		for (int i = 0; i < bytes.length; i++) {
			buff.append(bytes[i] + " ");
		}
		return buff.toString();
	}

}

结果：

--------------
8859 bytes:
bytes is:          104 101 108 108 111 63 63 63 63 
hex format is:     68  65  6c  6c  6f  3f 3f 3f 3f 

--------------
GBK bytes:
bytes is:          104 101 108 108 111 -93 -84 -60 -29 -70 -61 -93 -95 
hex format is:     68  65  6c  6c  6f  a3  ac  c4  e3  ba  c3  a3  a1

可见，在s中提取的8859-1格式的字节数组长度为9，中文字符都变成了“63”，ASCII码为63的是“?”，一些国外的程序在国内中文环境下运行时，经常出现乱码，上面布满了“?”，就是因为编码没有进行正确处理的结果。