ICU4C函数ucnv_convert

最新推荐文章于 2021-04-11 13:56:17 发布

原创最新推荐文章于 2021-04-11 13:56:17 发布 · 4.4k 阅读

0 ·

CC 4.0 BY-SA版权

Linux服务器专栏收录该内容

42 篇文章

订阅专栏

本文介绍如何使用ICU4C库进行字符编码转换，包括下载安装、使用ucnv_convert函数实现不同编码间的转换，并详细解释了参数设置及错误处理。

ICU4C 的获得：
从网址： http://site.icu-project.org/download
可以下载 windows 所需的库和头文件，当然还有 DLL 文件 :)
本文测试用的 4.2 版来源于: http://icu-project.org/download/4.2.html

ucnv_convert 函数：
头文件： unicode\ucnv.h
库文件： lib\icuuc.lib

函数声明：

[cpp]view plaincopy 
     
    
   U_STABLE int32_t U_EXPORT2  
  ucnv_convert(const char *toConverterName,  
               const char *fromConverterName,  
               char *target,  
               int32_t targetCapacity,  
               const char *source,  
               int32_t sourceLength,  
               UErrorCode *pErrorCode);  
    
   // 下面是 ICU4C 4.2 的测试代码  
   const char *toConverterName= "utf8";  
   const char *fromConverterName = "gb2312";  
   char target [100];  
   int32_t targetCapacity = 100;  
   const char *source="呵呵";  
   int32_t sourceLength = -1;  
   UErrorCode ErrorCode = U_ZERO_ERROR; // 文档中说该值必须初始化为U_ZERO_ERROR，其实如果不初始化该值而且转换中没有出错时该值是不会被填写的，即未初始化前的值  
   int ret = ucnv_convert(toConverterName, // utf8 或 utf-8 效果是一样的  
    fromConverterName, // gbk 与 gb2312 好像区别不大，用 gbk 可能会好些  
    target,     
    targetCapacity, // 详见下面  
    source,  
    sourceLength, // 长度为 -1 表明 NULL 终止的字符串  
    &ErrorCode);  // 该值不能为空，否则函数返回值为 0 并不做任何转换  
      // 返回值 ret 为整个源字符串 source 转换后的长度(即使 targetCapacity 空间不足也是如此)  

// targetCapacity 与转换所需空间的关系:
如果targetCapacity小于转换所需空间:
1. ret 的值为整个源串 source 转换后的字节数，该值不含 0 结尾的长度
2. target 中会填充 targetCapacity 所指定的转换的字节数，并且不会有 0 填充 target 的结尾
3. ErrorCode 会被置为 U_BUFFER_OVERFLOW_ERROR

如果targetCapacity等于转换所需空间:
1. ret 的值为整个源串 source 转换后的字节数，该值不含 0 结尾的长度
2. target 中会填充 targetCapacity 所指定的转换的字节数，并且不会有 0 填充 target 的结尾
3. ErrorCode 会被置为 U_STRING_NOT_TERMINATED_WARNING

如果targetCapacity大于转换所需空间:
1. ret 的值为整个源串 source 转换后的字节数，该值不含 0 结尾的长度
2. target 中会填充 ret 的字节数， target[ret] 处被填为 0 表示 target 字符串结束
3. ErrorCode 会被置为 U_ZERO_ERROR

其他：
target 可被置为 NULL (当然此时 targetCapacity 应该为 0),
此时:
1. ret 返回转换所需空间(该值不含 0 结尾的长度)
2. ErrorCode 会被置为 U_BUFFER_OVERFLOW_ERROR
可以使用这种方法来计算转换所需空间，但感觉不是太值