windows下UTF_8ToUnicode和UnicodeToGB2312

liublue66

于 2023-09-04 13:52:45 发布

阅读量194

点赞数

文章标签： windows c++

本文链接：https://blog.youkuaiyun.com/m0_73915763/article/details/132667646

版权

下面是使用 C++ 实现的 UTF_8ToUnicode 和 UnicodeToGB2312 函数：

#include <iostream>
#include <string>
#include <Windows.h>
#include <vector>

void UTF_8ToUnicode(std::wstring& pOut, const std::string& pText) {
    pOut = L"";

    for (size_t i = 0; i < pText.length(); ++i) {
        wchar_t unicodeChar = 0;

        if ((pText[i] & 0x80) == 0) {
            // 1字节UTF-8编码
            unicodeChar = pText[i];
        }
        else if ((pText[i] & 0xE0) == 0xC0 && i + 1 < pText.length() && (pText[i + 1] & 0xC0) == 0x80) {
            // 2字节UTF-8编码
            unicodeChar = ((pText[i] & 0x1F) << 6) | (pText[i + 1] & 0x3F);
            ++i;
        }
        else if ((pText[i] & 0xF0) == 0xE0 && i + 2 < pText.length() && (pText[i + 1] & 0xC0) == 0x80 && (pText[i + 2] & 0xC0) == 0x80) {
            // 3字节UTF-8编码
            unicodeChar = ((pText[i] & 0x0F) << 12) | ((pText[i + 1] & 0x3F) << 6) | (pText[i + 2] & 0x3F);
            i += 2;
        }
        else {
            // 错误的UTF-8编码
            unicodeChar = pText[i];
        }

        pOut += unicodeChar;
    }
}

void UnicodeToGB2312(std::string& pOut, const std::wstring& pText) {
    pOut = "";

    for (size_t i = 0; i < pText.length(); ++i) {
        wchar_t unicodeChar = pText[i];

        if (unicodeChar <= 0x7F) {
            // ASCII字符，直接拷贝
            pOut += static_cast<char>(unicodeChar);
        } else {
#ifdef _WINDOWS_
            int bufferSize = WideCharToMultiByte(CP_ACP, 0, &unicodeChar, 1, nullptr, 0, nullptr, nullptr);

            if (bufferSize > 0) {
                std::vector<char> buffer(bufferSize);
                WideCharToMultiByte(CP_ACP, 0, &unicodeChar, 1, buffer.data(), bufferSize, nullptr, nullptr);
                pOut += std::string(buffer.begin(), buffer.end());
            }
#else
            // 在非Windows环境下实现Unicode到GB2312的转换
            // ...
#endif
        }
    }
}

int main() {
    std::string utf8Text = u8"你好，世界！"; // UTF-8编码的文本
    std::wstring unicodeText;
    std::string gb2312Text;

    UTF_8ToUnicode(unicodeText, utf8Text);
    UnicodeToGB2312(gb2312Text, unicodeText);

    std::cout << "UTF-8: " << utf8Text << std::endl;
    std::wcout << "Unicode: " << unicodeText << std::endl;
    std::cout << "GB2312: " << gb2312Text << std::endl;

    return 0;
}

上述代码中，UTF_8ToUnicode 函数将接收一个 UTF-8 编码的字符串 pText，并将转换后的结果存储在 pOut 中（类型为 std::wstring）。该函数使用了位运算来解析不同长度的 UTF-8 字节序列，将其转换为相应的 Unicode 字符。

UnicodeToGB2312 函数接收一个 Unicode 编码的字符串 pText（类型为 std::wstring），并将转换后的 GB2312 编码结果存储在 pOut 中（类型为 std::string）。在 Windows 环境下，使用了 WideCharToMultiByte 函数将 Unicode 字符转换为 GB2312 编码的字符。在非 Windows 环境下，你需要根据实际情况实现相应的转换逻辑。

在 main 函数中，需要包含 <iostream>、<string> 和 <vector> 头文件来使用相应的类和函数。然后，你可以调用 UTF_8ToUnicode 函数将 UTF-8 编码的字符串转换为 Unicode 编码的字符串，并调用 UnicodeToGB2312 函数将 Unicode 编码的字符串转换为 GB2312 编码的字符串。最后，你可以通过输出语句将转换后的结果打印出来。

请注意，上述代码中的 #ifdef _WINDOWS_ 和 #else 部分是为了在 Windows 和非 Windows 环境下提供不同的实现方法。在 Windows 环境下，使用了 WideCharToMultiByte 函数进行转换。在非 Windows 环境下，你需要根据实际情况实现 UnicodeToGB2312 函数中的转换逻辑，并将相应的代码放在 #else 部分。