WideCharToMultiByte和MultiByteToWideChar函数的用法

最新推荐文章于 2025-05-06 11:22:54 发布

转载最新推荐文章于 2025-05-06 11:22:54 发布 · 336 阅读

C/C++ 专栏收录该内容

58 篇文章

订阅专栏

本文介绍了如何在ANSI与Unicode、UTF-8与Unicode之间进行字符串转换的方法。提供了具体的代码实现，并解释了使用的函数MultiByteToWideChar和WideCharToMultiByte的作用。

点击打开原文链接

辅助链接

WideCharToMultiByte的代码页用来标记与新转换的字符串相关的代码页。
MultiByteToWideChar的代码页用来标记与一个多字节字符串相关的代码页。

常用的代码页有CP_ACP和CP_UTF8两个。
使用CP_ACP代码页就实现了ANSI与Unicode之间的转换。
使用CP_UTF8代码页就实现了UTF-8与Unicode之间的转换。

下面是代码实现：

1. ANSI to Unicode

wstring ANSIToUnicode( const string& str )
{
 int  len = 0;
 len = str.length();
 int  unicodeLen = ::MultiByteToWideChar( CP_ACP,
            0,
            str.c_str(),
            -1,
            NULL,  //指向接收被转换字符串的缓冲区
            0 ); //接受被转换字符串缓冲区的宽字节数。若此值为0，函数不会执行转换，
                 //而是返回目标缓存所需的宽字符数。                                    
 wchar_t *  pUnicode;  
 pUnicode = new  wchar_t[unicodeLen+1];  
 memset(pUnicode,0,(unicodeLen+1)*sizeof(wchar_t));  
 ::MultiByteToWideChar( CP_ACP,
         0,
         str.c_str(),
         -1,
         (LPWSTR)pUnicode,
         unicodeLen );  
 wstring  rt;  
 rt = ( wchar_t* )pUnicode;

 delete[]  pUnicode; 
 
 return  rt;  
}

2. Unicode to ANSI

string UnicodeToANSI( const wstring& str )
{
 char*     pElementText;
 int    iTextLen;
 // wide char to multi char
 iTextLen = WideCharToMultiByte( CP_ACP,
         0,
         str.c_str(),
         -1,
         NULL,
         0,
         NULL,
         NULL );
 pElementText = new char[iTextLen + 1];
 memset( ( void* )pElementText, 0, sizeof( char ) * ( iTextLen + 1 ) );
 ::WideCharToMultiByte( CP_ACP,
         0,
         str.c_str(),
         -1,
         pElementText,
         iTextLen,
         NULL,
         NULL );
 string strText;
 strText = pElementText;

 delete[] pElementText;

 return strText;
}

3. UTF-8 to Unicode

wstring UTF8ToUnicode( const string& str )
{
 int  len = 0;
 len = str.length();
 int  unicodeLen = ::MultiByteToWideChar( CP_UTF8,
            0,
            str.c_str(),
            -1,
            NULL,
            0 );  
 wchar_t *  pUnicode;  
 pUnicode = new  wchar_t[unicodeLen+1];  
 memset(pUnicode,0,(unicodeLen+1)*sizeof(wchar_t));  
 ::MultiByteToWideChar( CP_UTF8,
         0,
         str.c_str(),
         -1,
         (LPWSTR)pUnicode,
         unicodeLen );  
 wstring  rt;  
 rt = ( wchar_t* )pUnicode;

 delete[]  pUnicode; 
 
 return  rt;  
}

4. Unicode to UTF-8

string UnicodeToUTF8( const wstring& str )
{
 char*     pElementText;
 int    iTextLen;
 // wide char to multi char
 iTextLen = WideCharToMultiByte( CP_UTF8,
         0,
         str.c_str(),
         -1,
         NULL,
         0,
         NULL,
         NULL );
 pElementText = new char[iTextLen + 1];
 memset( ( void* )pElementText, 0, sizeof( char ) * ( iTextLen + 1 ) );
 ::WideCharToMultiByte( CP_UTF8,
         0,
         str.c_str(),
         -1,
         pElementText,
         iTextLen,
         NULL,
         NULL );
 string strText;
 strText = pElementText;

 delete[] pElementText;

 return strText;
}

注：

1、如何取得一个既包含单字节字符又包含双字节字符的字符串的字符个数？

可以调用Microsoft Visual C++的运行期库包含函数_mbslen来操作多字节（既包括单字节也包括双字节）字符串。

调用strlen函数，无法真正了解字符串中究竟有多少字符，它只能告诉你到达结尾的0之前有多少个字节。

2、为什么要使用Unicode？

a、可以很容易地在不同语言之间进行数据交换。

b、使你能够分配支持所有语言的单个二进制.exe文件或DLL文件。

c、提高应用程序的运行效率。

3、Windows定义的Unicode数据类型有哪些？

WCHAR Unicode字符

PWSTR 指向Unicode字符串的指针

PCWSTR 指向一个恒定的Unicode字符串的指针

对应的ANSI数据类型：CHAR，LPSTR和LPCSTR。

ANSI/Unicode通用数据类型：TCHAR，PTSTR, LPCTSTR。

4、如何对Unicode进行操作？

字符集特性实例

ANSI 操作函数以str开头 strcpy

Unicode 操作函数以wcs开头 wcscpy

ANSI/Unicode 操作函数以_tcs开头 _tcscpy（C运行期库）

ANSI/Unicode 操作函数以lstr开头 lstrcpy（Windows函数）

ANSI版本函数结尾以A表示；Unicode版本函数结尾以W表示。Windows会如下定义：

#ifdef UNICODE 
#define CreateWindowEx CreateWindowExW 
#else 
#define CreateWindowEx CreateWindowExA 
#endif

5、如何表示Unicode字符串常量？

ANSI “string”

Unicode L“string”

ANSI/Unicode T(“string”)

6、为什么应当尽量使用操作系统函数？

有助于稍稍提高应用程序的运行性能，因为操作系统字符串函数常常被大型应用程序比如操作系统的外壳进程Explorer.exe所使用。由于这些函数使用得很多，因此，在应用程序运行时，它们可能已经被装入RAM。
如：StrCat，StrChr，StrCmp和StrCpy等。

7、如何编写符合ANSI和Unicode的应用程序？

a、将文本串视为字符数组，而不是chars数组或字节数组。

b、将通用数据类型（如TCHAR和PTSTR）用于文本字符和字符串。

c、将显式数据类型（如BYTE和PBYTE）用于字节、字节指针和数据缓存。

d、将TEXT宏用于原义字符和字符串。

e、执行全局性替换（例如用PTSTR替换PSTR）。

f、修改字符串运算问题。例如函数通常希望在字符中传递一个缓存的大小，而不是字节。这意味着不应该传递

sizeof(szBuffer),而应该传递（sizeof(szBuffer)/sizeof(TCHAR)。另外，如果需要为字符串分配一个内存块，并且

拥有该字符串中的字符数目，那么请记住要按字节来分配内存。这就是说，应该调用malloc(nCharacters

*sizeof(TCHAR)),而不是调用malloc(nCharacters)。

8、如何判断一个文本文件是ANSI还是Unicode？

判断如果文本文件的开头两个字节是0xFF和0xFE，那么就是Unicode，否则是ANSI。

9、如何在Unicode与ANSI之间转换字符串？

Windows函数MultiByteToWideChar用于将多字节字符串转换成宽字符串；函数WideCharToMultiByte将宽字符串转换成等价的多字节字符串。