这几天在WinCE编程中,总是遇到一些字符集之间的转换,所以就总结一下。
Unicode字符串也就是wide-char string,采用双字节编码,ANSI字符串也就是ASCII字符串,采用单字节编码,也可以称为Single-byte Character Sets(SBCS),为了扩展SBCS字符,Microsoft也采用了一种叫做Double-bytes Character Sets(DBCS)的字符集,无论是ANSI字符集,还是DBCS字符集,都提供了相应的API函数来转化为Unicode字符集。一般情况下,MultiByte也就是指的是DBCS字符集,也包括ANSI字符集,因为如果采用DBCS编码,输入英文就是Single-byte,输入中文就是MultiByte,所以一般情况下,统称为MultiByte。
一般采用以下两种方式转换:
第一种方式是调用Microsoft提供的API函数,主要有:
以上函数的包含在Stringapiset.h (include Windows.h)中
#include <windows.h>
//-------------------------------------------------------------------------------------
//Description:
// This function maps a character string to a wide-character (Unicode) string
//
//Parameters:
// lpcszStr: [in] Pointer to the character string to be converted
// lpwszStr: [out] Pointer to a buffer that receives the translated string.
// dwSize: [in] Size of the buffer
//
//Return Values:
// TRUE: Succeed
// FALSE: Failed
//
//Example:
// AnsiToUnicode(szA,szW,sizeof(szW)/sizeof(szW[0]));
//---------------------------------------------------------------------------------------
BOOL AnsiToUnicode(LPCSTR lpcszStr, LPWSTR lpwszStr, DWORD dwSize)
{
// Get the required size of the buffer that receives the Unicode string.
DWORD dwMinSize;
dwMinSize = MultiByteToWideChar (CP_ACP, 0, lpcszStr, -1, NULL, 0);
if(dwSize < dwMinSize)
{
return FALSE;
}
// Convert headers from ASCII to Unicode.
MultiByteToWideChar (CP_ACP, 0, lpcszStr, -1, lpwszStr, dwMinSize);
return TRUE;
}
//-------------------------------------------------------------------------------------
//Description:
// This function maps a wide-character string to a new character string
//
//Parameters:
// lpcwszStr: [in] Pointer to the character string to be converted
// lpszStr: [out] Pointer to a buffer that receives the translated string.
// dwSize: [in] Size of the buffer
//
//Return Values:
// TRUE: Succeed
// FALSE: Failed
//
//Example:
// UnicodeToAnsi(szW,szA,sizeof(szA)/sizeof(szA[0]));
//---------------------------------------------------------------------------------------
BOOL UnicodeToAnsi(LPCWSTR lpcwszStr, LPSTR lpszStr, DWORD dwSize)
{
DWORD dwMinSize;
dwMinSize = WideCharToMultiByte(CP_ACP,NULL,lpcwszStr,-1,NULL,0,NULL,NULL);
if(dwSize < dwMinSize)
{
return FALSE;
}
WideCharToMultiByte(CP_ACP,NULL,lpcwszStr,-1,lpszStr,dwSize,NULL,NULL);
return TRUE;
}
第二种方式是调用C Run-time Library 函数
size_t wcstombs(
char *mbstr,
const wchar_t *wcstr,
size_t count
);
size_t mbstowcs(
wchar_t *wcstr,
const char *mbstr,
size_t count
);
这些函数的包含在<stdlib.h>中
示例代码:
// crt_wcstombs.c
// compile with: /W3
// This example demonstrates the use
// of wcstombs, which converts a string
// of wide characters to a string of
// multibyte characters.
#include <stdlib.h>
#include <stdio.h>
#define BUFFER_SIZE 100
int main( void )
{
size_t count;
char *pMBBuffer = (char *)malloc( BUFFER_SIZE );
wchar_t *pWCBuffer = L"Hello, world.";
printf("Convert wide-character string:\n" );
count = wcstombs(pMBBuffer, pWCBuffer, BUFFER_SIZE ); // C4996
// Note: wcstombs is deprecated; consider using wcstombs_s instead
printf(" Characters converted: %u\n",
count );
printf(" Multibyte character: %s\n\n",
pMBBuffer );
free(pMBBuffer);
}
在WinCE中,只支持Unicode字符集,所以要时刻注意字符集之间的转换。
另外为了编写跨平台代码,尽量使用通用数据类型和通用数据类型的函数
在标准C库中,可以用TCHAR来表示数据类型,可以用_tcs前缀的函数,当要采用Unicode编码时,添加宏定义
#define _UNICODE //标准C库
#define UNICODE //Microsoft Windows运行时库 #include <tchar.h> #include <wchar.h>