Jeffrey Richter对于Unicode的建议

最新推荐文章于 2021-02-11 22:09:51 发布

翻译最新推荐文章于 2021-02-11 22:09:51 发布 · 1.3k 阅读

文章标签：

#application #string #windows #arrays #buffer #compiler

Windows平台相关同时被 2 个专栏收录

27 篇文章

订阅专栏

外文翻译

3 篇文章

订阅专栏

本文阐述了使用Unicode字符和字符串的优点，包括便于全球化、提高应用效率、简化代码整合等，并给出了具体的开发指导建议。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

对于Windows via C/C++第二章的重要部分，其中的读书笔记中原滋原味地保留了Jeffrey的建议，这里帮那些见到E文就犯晕的同仁准备一个翻译好的版本：）当然还是中英文对照的（万一我翻译的不好，还有个英文原版可以做对照）

Why You Should Use Unicode（为什么你应该使用Unicode）

When developing an application, we highly recommend that you use Unicode characters and strings. Here are some of the reasons why:

（开发一个应用程序的时候，我们强烈建议你使用Unicode字符和字符串。这里有些为什么要这样的原因如下：）

Unicode makes it easy for you to localize your application to world markets.

译文：Unicode使得你的应用程序容易面向全球市场本地化

Unicode allows you to distribute a single binary (.exe or DLL) file that supports all languages.

译文：Unicode允许你发布支持所有语言的单个二进制文件（.exe或者DLL）

Unicode improves the efficiency of your application because the code performs faster and uses less memory. Windows internally does everything with Unicode characters and strings, so when you pass an ANSI character or string, Windows must allocate memory and convert the ANSI character or string to its Unicode equivalent.

译文：Unicode提高了你的应用程序的效率，因为代码可以执行地更快耗用更少的内存。Windows内部使用Unicode字符及字符串处理每件事件，因此，当你传一个ANSI字符或者字符串时，Windows必须为之分配内存并将ANSI字符或者字符串转换为与之相等价的Unicode字符或者字符串。

Using Unicode ensures that your application can easily call all nondeprecated Windows functions, as some Windows functions offer versions that operate only on Unicode characters and strings.

译文：使用Unicode保证你的应用程序能够很容易地调用所有的非遗弃Windows函数，例如一些Windows函数提供了只针对Unicode字符或者字符串的操作。

Using Unicode ensures that your code easily integrates with COM (which requires the use of Unicode characters and strings).

译文：使用Unicode保证你的代码能够容易地与COM整合（COM需要Unicode字符以及字符串的使用）

Using Unicode ensures that your code easily integrates with the .NET Framework (which also requires the use of Unicode characters and strings).

译文：使用Unicode保证你的代码能够容易地与,Net Framework整合（Net Framework也需要Unicode字符以及字符串的使用）

Using Unicode ensures that your code easily manipulates your own resources (where strings are always persisted as Unicode).

译文：使用Unicode保证你的代码能够便利地操作你自己的资源（这些地方的字符串总是保持为Unicode）

How We Recommend Working with Characters and Strings

(我们建议怎样使用字符以及字符串)

Based on what you've read in this chapter, the first part of this section summarizes what you should always keep in mind when developing your code. The second part of the section provides tips and tricks for better Unicode and ANSI string manipulations. It's a good idea to start converting your application to be Unicode-ready even if you don't plan to use Unicode right away. Here are the basic guidelines you should follow:

译文：基于你在本章所阅读的内容，这一节的第一部分概括了当你写代码的时候什么是你应该一直牢记的东西，第二部分提供了一些更好操作Unicode以及ANSI的提示以及技巧。开始把你的应用程序转换为Unicode-Ready的状态是个好主意，甚至于你目前还没有立即打算使用Unicode。

Start thinking of text strings as arrays of characters, not as arrays of chars or arrays of bytes.

译文：开始考虑把文本字符串作为字符数组，而不是使用char数组或者字节数组

Use generic data types (such as TCHAR/PTSTR) for text characters and strings.

译文：对于文本字符以及字符串使用通用数据类型（例如：TCHAR/PTSTR）

Use explicit data types (such as BYTE and PBYTE) for bytes, byte pointers, and data buffers.

译文：对于字节、字节指针以及数据缓存使用显示数据类型（例如：BYTE以及PBYTE）

Use the TEXT or _T macro for literal characters and strings, but avoid mixing both for the sake of consistency and for better readability.

译文：对于字面字符以及字符串使用TEXT或者_T宏，同时为了保持一致性以及更好的可读性要避免混用。

Perform global replaces. (For example, replace PSTR with PTSTR.)

译文：执行全局替换（例如：使用PTSTR替换PSTR）

Modify string arithmetic problems. For example, functions usually expect you to pass a buffer's size in characters, not bytes. This means you should pass _countof(szBuffer) instead of sizeof(szBuffer). Also, if you need to allocate a block of memory for a string and you have the number of characters in the string, remember that you allocate memory in bytes. This means that you must call malloc(nCharacters * sizeof(TCHAR)) and not call malloc(nCharacters). Of all the guidelines I've just listed, this is the most difficult one to remember, and the compiler offers no warnings or errors if you make a mistake. This is a good opportunity to define your own macros, such as the following:

#define chmalloc(nCharacters) (TCHAR*)malloc(nCharacters * sizeof(TCHAR)).

译文：修正字符串算术问题。例如：有些函数通常希望你传入一个缓冲区的字符个数大小，而不是字节数。这意味着你应当

传入_countof(szBuffer)替换sizeof(szBuffer)。同样，如果你需要为一个字符串分配一块内存同时你得到了这个字符串

的字符个数，记住你是以Byte为单位分配内存，这意味着你必须调用malloc(nCharacters*sizeof(TCHAR))并且不要调用

malloc(nCharacters)。我所列出的所有指导意见中，这个是最为难以记住的一个，并且编译器没有提供任何的警告或者错误

提示。这是一个定义你自己宏的好机会：），就比如下面所示：

  #define chmalloc(nCharacters) (TCHAR*)malloc(nCharacters * sizeof(TCHAR)).

Avoid printf family functions, especially by using %s and %S field types to convert ANSI to Unicode strings and vice versa. Use MultiByteToWideChar and WideCharToMultiByte instead, as shown in "Translating Strings Between Unicode and ANSI" below.

译文：避免printf族函数，特别是使用%s以及%S位域类型来将ANSI转换为Unicode字符串，反之亦然。（应当）使用MultiByteToWideChar 以及WideCharToMultiByte来替换，就如 ”Unicode与ANSI字符串之间的转换“里展现的一样。

Always specify both UNICODE and _UNICODE symbols or neither of them.

译文：总是同时指定UNICODE以及_UNICODE标记或者两者都不指定。

In terms of string manipulation functions, here are the basic guidelines that you should follow:

在字符串操作函数方面，这里有些你可以遵守的基本的指南：

Always work with safe string manipulation functions such as those suffixed with _s or prefixed with StringCch. Use the latter for explicit truncation handling, but prefer the former otherwise.

译文：一贯使用安全字符串操作函数，例如那些有_s后缀的或者是StringCch前缀的函数。显式截断处理使用新函数，其他方面则可沿用之前的函数。

Don't use the unsafe C run-time library string manipulation functions. (See the previous recommendation.) In a more general way, don't use or implement any buffer manipulation routine that would not take the size of the destination buffer as a parameter. The C run-time library provides a replacement for buffer manipulation functions such as memcpy_s, memmove_s, wmemcpy_s, or wmemmove_s. All these methods are available when the __STDC_WANT_SECURE_LIB__ symbol is defined, which is the case by default in CrtDefs.h. So don't undefine __STDC_WANT_SECURE_LIB__.

译文：不要使用不安全的C运行库字符串操作函数（见前面的建议）。更普遍的做法是，不要使用或者实现任何可能使用目标缓冲区大小作为参数的缓存操作函数。C运行库提供了一个缓存操作函数的替代函数，例如：memcpy_s,memmove_s,wmemcpy_s,或者wmemmove_s。所有这些方法当__STDC_WANT_SECURE_LIB__ 标记定义后都可用，这个是CtrDefs.h里面默认的情况，所以不要不定义__STDC_WANT_SECURE_LIB__.（PS：所以一定要定义__STDC_WANT_SECURE_LIB__.）

Take advantage of the /GS (http://msdn2.microsoft.com/en-us/library/aa290051(VS.71).aspx) and /RTCs compiler flags to automatically detect buffer overruns.

译文：利用/GS以及/RTCs编译标记来自动检测缓冲区溢出。

Don't use Kernel32 methods for string manipulation such as lstrcat and lstrcpy.

译文：不要使用Kernel32的方法做字符串操作，例如：lstrcat以及lstrcpy

There are two kinds of strings that we compare in our code. Programmatic strings are file names, paths, XML elements and attributes, and registry keys/values. For these, use CompareStringOrdinal, as it is very fast and does not take the user's locale into account. This is good because these strings remain the same no matter where your application is running in the world. User strings are typically strings that appear in the user interface. For these, call CompareString(Ex), as it takes the locale into account when comparing strings.

译文：在我们的代码中有两种字符串比较。编程中的字符串是文件名、路径、XML元素以及属性，还有注册键值。对于这些字符串，使用CompareStringOrdinal，因为该函数执行非常快并且不需要用户本地说明，这个很棒因为这些字符串不论你的程序在世界那里运行都一样。用户字符串是一个在用户接口中出现的典型字符串，对于这些字符串，调用 CompareString（Ex），当比较字符串时，将本地说明考虑进去。

You don't have a choice: as a professional developer, you can't write code based on unsafe buffer manipulation functions. And this is the reason why all the code in this book relies on these safer functions from the C run-time library.

译文：你别无选择：作为一个专业开发人员，你不可能编写基于不安全缓冲区操作的代码，同时这也是为什么本书中所有的代码都是使用来自于C运行库中的安全函数的原因。

OK，翻译完毕，收工闪人：）