java 字符编码 utf16 utf16be utf16le unicode 探究相关小函数。

最新推荐文章于 2022-08-21 23:08:49 发布

原创最新推荐文章于 2022-08-21 23:08:49 发布 · 1k 阅读

1 ·

CC 4.0 BY-SA版权

文章标签：

#java #byte #string

行业的足迹专栏收录该内容

37 篇文章

订阅专栏

本文深入探讨了UTF-16编码方式及其不同字节顺序（Big Endian和Little Endian）的应用。通过示例代码解释了如何在Java中处理UTF-16编码的数据，并展示了如何正确解析带有字节顺序标记的字符串。

/**
*
*
* UTF-16 charset 使用 16 位量，因此对字节顺序敏感。流的字节顺序可以由 Unicode 字符 '/uFEFF'
* 所表示的初始字节顺序标记来指示。
*
* UTF-16BE 16 位 UCS 转换格式，Big Endian（最低地址存放高位字节）字节顺序

* UTF-16LE 16 位 UCS
* 转换格式，Little-endian（最高地址存放低位字节）字节顺序
*
* java中如果没有feff的标志，则默认为 feff
*
* @throws UnsupportedEncodingException
*/
void unicodeShow() throws UnsupportedEncodingException {
  String shz;
  byte[] hz;
  hz = new byte[4];
  hz[0] = (byte) 0xfe;
  hz[1] = (byte) 0xff;
  hz[2] = 0x55;
  hz[3] = 0x4a;
  shz = new String(hz, "utf-16");
  System.out.println(shz);
  hz = new byte[2];
  hz[0] = 0x55;
  hz[1] = 0x4a;
  shz = new String(hz, "utf-16");
  System.out.println(shz);
  hz = new byte[2];
  hz[0] = 0x55;
  hz[1] = 0x4a;
  shz = new String(hz, "utf-16be");
  System.out.println(shz);
  hz = new byte[4];
  hz[0] = (byte) 0xff;
  hz[1] = (byte) 0xfe;
  hz[2] = 0x4a;
  hz[3] = 0x55;
  shz = new String(hz, "utf-16");
  System.out.println(shz);
  hz = new byte[2];
  hz[0] = 0x4a;
  hz[1] = 0x55;
  shz = new String(hz, "utf-16le");
  System.out.println(shz);
  System.out.println("啊 UNICODE:U+554A");
  System.out.print(Integer.toHexString("啊".charAt(0) >> 8 & 0xff));
  System.out.print(" ");
  System.out.print(Integer.toHexString("啊".charAt(0) & 0xff));
  System.out.println();

  for (byte i : "啊".getBytes("utf-16"))
   System.out.print(Integer.toHexString(i & 0xff) + " ");
  System.out.println();
  for (byte i : "啊".getBytes("utf-16be"))
   System.out.print(Integer.toHexString(i & 0xff) + " ");
  System.out.println();
  for (byte i : "啊".getBytes("utf-16le"))
   System.out.print(Integer.toHexString(i & 0xff) + " ");
  System.out.println();
}