中英文所占字节数

最新推荐文章于 2024-05-28 22:19:27 发布

最新推荐文章于 2024-05-28 22:19:27 发布 · 467 阅读

文章标签：

#c/c++ #java

jquery 专栏收录该内容

53 篇文章

订阅专栏

本文通过一个Java程序示例，展示了不同字符编码方式下英文字符“A”与中文字符“人”的字节长度变化。实验涵盖了GB2312、GBK、GB18030等常见编码及UTF-8、UTF-16等国际标准。

package com.test;

import java.io.UnsupportedEncodingException;

public class byteCountTest {

    public static void printByteCount(String str , String encoding) {
        int length = 0;
        try {
            length = str.getBytes(encoding).length;
        } catch (UnsupportedEncodingException e) {
            length = 0;
        }
        System.out.println(encoding + " : " + length);
    }

    public static void main(String[] args) {
        String en = "A";
        String ch = "人";
        printByteCount(en, "GB2312");
        printByteCount(en, "GBK");
        printByteCount(en, "GB18030");
        printByteCount(en, "ISO-8859-1");
        printByteCount(en, "UTF-8");
        printByteCount(en, "UTF-16");
        printByteCount(en, "UTF-16BE");
        printByteCount(en, "UTF-16LE");
        System.out.println("-------------------------");
        printByteCount(ch, "GB2312");
        printByteCount(ch, "GBK");
        printByteCount(ch, "GB18030");
        printByteCount(ch, "ISO-8859-1");
        printByteCount(ch, "UTF-8");
        printByteCount(ch, "UTF-16");
        printByteCount(ch, "UTF-16BE");
        printByteCount(ch, "UTF-16LE");
    }

}

结果：

GB2312 : 1
GBK : 1
GB18030 : 1
ISO-8859-1 : 1
UTF-8 : 1
UTF-16 : 4
UTF-16BE : 2
UTF-16LE : 2
-------------------------
GB2312 : 2
GBK : 2
GB18030 : 2
ISO-8859-1 : 1
UTF-8 : 3
UTF-16 : 4
UTF-16BE : 2
UTF-16LE : 2