UTF-8、UTF-16 与多字节编码（如 GBK）之间的互转方

原创于 2025-07-28 14:56:52 发布 · 400 阅读

3 ·

CC 4.0 BY-SA版权

文章标签：

#mfc #c++

1. UTF-8 ↔ UTF-16 互转

方法 1：使用 Windows API（推荐 Windows 平台）

#include <windows.h>
#include <string>

// UTF-8 → UTF-16
std::wstring UTF8ToUTF16(const std::string& utf8) {
    if (utf8.empty()) return L"";
    int len = MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), -1, NULL, 0);
    std::wstring utf16(len, 0);
    MultiByteToWideChar(CP_UTF8, 0, utf8.c_str(), -1, &utf16[0], len);
    return utf16;
}

// UTF-16 → UTF-8
std::string UTF16ToUTF8(const std::wstring& utf16) {
    if (utf16.empty()) return "";
    int len = WideCharToMultiByte(CP_UTF8, 0, utf16.c_str(), -1, NULL, 0, NULL, NULL);
    std::string utf8(len, 0);
    WideCharToMultiByte(CP_UTF8, 0, utf16.c_str(), -1, &utf8[0], len, NULL, NULL);
    return utf8;
}

2. 多字节（如 GBK） ↔ UTF-8 互转

方法：通过 UTF-16 中转（兼容性最佳）

// 多字节（GBK）→ UTF-8
std::string MBToUTF8(const std::string& gbk) {
    // 先转 UTF-16
    int len = MultiByteToWideChar(CP_ACP, 0, gbk.c_str(), -1, NULL, 0);
    std::wstring utf16(len, 0);
    MultiByteToWideChar(CP_ACP, 0, gbk.c_str(), -1, &utf16[0], len);
    // 再转 UTF-8
    return UTF16ToUTF8(utf16);
}

// UTF-8 → 多字节（GBK）
std::string UTF8ToMB(const std::string& utf8) {
    // 先转 UTF-16
    std::wstring utf16 = UTF8ToUTF16(utf8);
    // 再转多字节
    int len = WideCharToMultiByte(CP_ACP, 0, utf16.c_str(), -1, NULL, 0, NULL, NULL);
    std::string gbk(len, 0);
    WideCharToMultiByte(CP_ACP, 0, utf16.c_str(), -1, &gbk[0], len, NULL, NULL);
    return gbk;
}

关键参数：

CP_ACP：系统本地代码页（如中文 Windows 是 GBK）。
CP_UTF8：指定 UTF-8 编码。
3. UTF-16 ↔ 多字节（如 GBK）互转
方法：直接使用 Windows API

// UTF-16 → 多字节（GBK）
std::string UTF16ToMB(const std::wstring& utf16) {
    int len = WideCharToMultiByte(CP_ACP, 0, utf16.c_str(), -1, NULL, 0, NULL, NULL);
    std::string gbk(len, 0);
    WideCharToMultiByte(CP_ACP, 0, utf16.c_str(), -1, &gbk[0], len, NULL, NULL);
    return gbk;
}

// 多字节（GBK）→ UTF-16
std::wstring MBToUTF16(const std::string& gbk) {
    int len = MultiByteToWideChar(CP_ACP, 0, gbk.c_str(), -1, NULL, 0);
    std::wstring utf16(len, 0);
    MultiByteToWideChar(CP_ACP, 0, gbk.c_str(), -1, &utf16[0], len);
    return utf16;
}

4. 完整示例代码

#include <iostream>
#include <windows.h>

int main() {
    // UTF-8 ↔ UTF-16
    std::string utf8 = u8"中文测试";
    std::wstring utf16 = UTF8ToUTF16(utf8);
    std::string backToUTF8 = UTF16ToUTF8(utf16);
    std::wcout << L"UTF-16: " << utf16 << std::endl;
    std::cout << "Back to UTF-8: " << backToUTF8 << std::endl;

    // 多字节（GBK）↔ UTF-8
    std::string gbk = UTF8ToMB(utf8);
    std::string utf8FromGBK = MBToUTF8(gbk);
    std::cout << "GBK: " << gbk << std::endl; // 控制台需支持本地代码页
    std::cout << "Back to UTF-8: " << utf8FromGBK << std::endl;

    return 0;
}

5. 关键注意事项

编码一致性：
- 确保源数据的真实编码与转换函数匹配（如误将 GBK 当作 UTF-8 转换会乱码）。
BOM 处理：
- UTF-8 文件可能含 BOM（\xEF\xBB\xBF），需手动跳过。
跨平台兼容性：
- Linux/macOS 默认使用 UTF-8，无需多字节转换。
性能优化：
- 频繁转换时，可复用缓冲区（避免重复分配内存）。
总结

转换方向	推荐方法	核心函数
UTF-8 ↔ UTF-16	Windows API 或 `<codecvt>`	`MultiByteToWideChar` / `WideCharToMultiByte`
多字节 ↔ UTF-8	通过 UTF-16 中转	同上
UTF-16 ↔ 多字节	直接使用 Windows API	同上