【读书笔记】" An introduce to Unicode (chapter 2)

Chapter2 – An introduce to Unicode

    ·Unicode is an extension of ASCII character encoding set.

    ·ASCII is now using a byte of 8-bit per character, and Unicode use full of 16 bits for character encoding.

    ·In this case, it allows Unicode to represent all the letters and all ideographs, and other symbol written in other language of the world are used to computer communication.

    ·Unicode is intended initially to supplement ASCII and, with any luck, eventually replace it.

·The C programming language as formalized by ANSI inherently supports Unicode through its support of wide characters.

 

A brief of character sets

Character sets

Introduce period

Feature

Telegraph encoding set

between 1838 and 1854

·Each letter in the alphabet corresponded to a series of short and long pulses (dots and dashes)

·No distinction uppercase and lowercase letters but numbers and punctuation marks had their own codes

Morse code

Between 1821 and 1824

·essentially a 6-bit code that encodes letters

·common letter combinations, common words, and punctuation

Telex codes

standardized in 1931

·5-bit codes that included letter shifts and figure shifts

BCDIC

 

·Binary-Coded Decimal Interchange Code"

8-bit EBCDIC

1960s

 

ASCII

origins in the late 1950s and was finalized in 1967

·a total of 128 codes

·The 26 letter codes are contiguous

·The codes for the 10 digits are easily derived from the value of the digits

ANSI character set

1985

 

Double-Byte Character Sets(DBCS)

 

·maintain all kinds of language character sets

·introduce Code-Page concept

·not compatible to ASCII which is 1 byte

·insufficient and awkward.

Unicode

 

·allowing the representation of 65,536 characters

· sufficient for all the characters and ideographs

· compatible with ASCII

· simply no ambiguity with only one character set

 

Wide character set in C

ANSI C also supports multibyte character set, and wide characters aren't necessarily Unicode.

 

The char Date Type

char data type is encoded by one byte. The definition likes so.

char c = ‘A’;          1byte

char* p = “Hello, World!”;            12bytes

char a[] = “Hello, World!”;           sizeof(a) is 13byte; with ‘/ 0’ as its end

char a[10];                                 sizeof(a) is 13byte;

 

Wide characters

wide char type in C is based on wchar_t data type which is defined in <wchar.h>. The definition likes so.

typedef unsigned short wchar_t

we can use following statement to define some wide characters.

wchar_t c = ‘A’;           2bytes  equivalent to wcha_t c = L‘A’;

wchar_t* p = L“Hello, World!”;           26bytes

wchar_t a[] = L“Hello, World!”;           sizeof(a) is 28bytes;      with ‘/ 0’ as its end

 

Wide character functions library

original char data type character functions is showed below

char *pc = “Hello!”;

wchar_t *pw = “Hello!”;

int iLength = strlen(pc);

iLength = strlen(pw) is syntax error as strlen() is defined to process strlen( const char*) while pw is wchar* ( as defined unsigned short* ). This statement will be considered by complier as error or warning.

 

The form of string stored in memory:

The 6 characters of the character string "Hello!" have the 16-bit values:

0x0048 0x0065 0x 006C 0x 006C 0x 006F 0x0021

       and stored in intel processor as this form:

       48 00 65 00 6C 00 6C 00 6F 00 21 00

       If iLength = strlen(pw) could be complied by complier the iLength will be assigned 1;

      

       wide character function in C

There are alternations of 1byte character functions while us wchar_t data type, and hese functions are declared both in < wchar.h> and in the header file where the normal function is declared

1byte char data type functions

wide char data type functions

strlen( const char*)

wcslen( const wchar_t*)

printf( const char*, …)

wprintf( const wchar_t*, …)

Maintain a single source code

    ·It is obvious to provide two version of the source code. One is complied for ASCII char encoding and the other is complied for wide encoding system.

    ·Use <TCHAR.H> head file to maintain one version source code which is defined in VC++ by Microsoft and it is not the ANSI C Standard.

    How to use TCHAR.H?

    There are some very useful definitions in TCHAR.H :

    #ifdef _UNICODE

    typedef wchar_t TCHAR 
    
    
    #define __T(x) L##x
    
    
    #define _tcslen wcslen
    
    
    #else
    
    
    #define __T(x) x
    
    
    typedef char TCHAR
    
    
    #define _tcslen strlen
    
    
    #endif      /* _UNICODE*/
    
    
    #define _T(x) __T(x)
    
    
    #define _TEXT(x) __T(x)
    
    

  
  
   
    
  
  

       So we can use _tcslen to declare characters whatever there are char or wide char. The translate between wcslen and strlen is automatic by complier. we can only transfer option “ –D _UNICODE ” to complier if we want to use wide char functions in our program.

we can make declarations like so:

TCHAR *pstr = _TEXT(“Hello, World!”);

 

Wide Characters and Windows

WINNT supports not only ASCII character set but UNICODE set. So it can accept both 8-bit and 16-bit character strings.

WIN98 has much less supports of UNICODE than WINNT. Only a few Windows 98 function calls support wide-character strings

 

Windows Header File Types

Windows program includes the header file WINDOWS.H. This file includes a number of other header files, including WINDEF.H, which has many of the basic type definitions used in Windows and which itself includes WINNT.H. WINNT.H handles the basic Unicode support.

There are some new data types and useful Macros in WINNT.H:

These definitions let you mix ASCII and Unicode characters strings in the same program or write a single program that can be compiled for either ASCII or Unicode

typedef char CHAR ;
typedef wchar_t WCHAR ;     // wc
typedef CHAR * PCHAR, * LPCH, * PCH, * NPSTR, * LPSTR, * PSTR ;
typedef CONST CHAR * LPCCH, * PCCH, * LPCSTR, * PCSTR ;

 

typedef WCHAR * PWCHAR, * LPWCH, * PWCH, * NWPSTR, * LPWSTR, * PWSTR ;
typedef CONST WCHAR * LPCWCH, * PCWCH, * LPCWSTR, * PCWSTR ;
#ifdef  UNICODE                   
typedef WCHAR TCHAR, * PTCHAR ;
typedef LPWSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef LPCWSTR LPCTSTR ;

      
      
       
        
      
      
#define __TEXT(quote) L##quote 

      
      
       
        
      
      
#else 
typedef char TCHAR, * PTCHAR ;
typedef LPSTR LPTCH, PTCH, PTSTR, LPTSTR ;
typedef LPCSTR LPCTSTR ;

      
      
       
        
      
      
#define __TEXT(quote) quote
#endif

      
      
       
        
      
      
#define TEXT(quote) __TEXT(quote)

      
      
       
        
      
      

 

8-bit character variables and strings,

use CHAR, PCHAR (or one of the others),

explicit 16-bit character variables and strings

use WCHAR, PWCHAR, and append an L before quotation marks

8 bit or 16 bit depending on the definition of the UNICODE identifier

use TCHAR, PTCHAR, and the TEXT macro

 

 

Windows' String Functions

Microsoft C includes wide-character and generic versions of all C run-time library functions that require character string arguments.

ILength = lstrlen (pString) ;
pString = lstrcpy (pString1, pString2) ;
pString = lstrcpyn (pString1, pString2, iCount) ;
pString = lstrcat (pString1, pString2) ;
iComp = lstrcmp (pString1, pString2) ;
iComp = lstrcmpi (pString1, pString2) ;

These work much the same as their C library equivalents. They accept wide-character strings if the UNICODE identifier is defined and regular strings if not.

 

 

Using printf in Windows

The printf() function in C could not be used in Window programming.

use fprintf() function to output to files.

use sprintf() function to format strings, and then we can pass it to MessageBox().

char szBuffer [100] ;
         
         
        sprintf (szBuffer, "The sum of %i and %i is %i", 5, 3, 5+3) ;
         
         
        puts (szBuffer) ;

 

int sprintf (char * szBuffer, const char * szFormat, ...)
         
         
{
         
         
     int     iReturn ;
         
         
     va_list pArgs ;
         
         
     va_start (pArgs, szFormat) ;
         
         
     iReturn = vsprintf (szBuffer, szFormat, pArgs) ;
         
         
     va_end (pArgs) ;
         
         
     return iReturn ;
         
         
}
         
         
The va_start macro sets pArg to point to the variable on the stack right above the szFormat argument on the stack.
         
         
        

 

 

ASCII

Wide-Character

Generic

Variable Number
of Arguments

 

 

 

Standard Version

sprintf

swprintf

_stprintf

Max-Length Version

_snprintf

_snwprintf

_sntprintf

Windows Version

wsprintfA

wsprintfW

wsprintf

Pointer to Array
of Arguments

 

 

 

Standard Version

vsprintf

vswprintf

_vstprintf

Max-Length Version

_vsnprintf

_vsnwprintf

_vsntprintf

Windows Version

wvsprintfA

wvsprintfW

wvsprintf

 

A Formatting Message Box

SCRNSIZE.C

#include <windows.h>
         
         
#include <tchar.h>     
         
         
#include <stdio.h>     
         
         

       
       
        
         
       
       
int CDECL MessageBoxPrintf (TCHAR * szCaption, TCHAR * szFormat, ...)
         
         
{
         
         
     TCHAR   szBuffer [1024] ;
         
         
     va_list pArgList ;
         
         
          // The va_start macro (defined in STDARG.H) is usually equivalent to:
         
         
          // pArgList = (char *) &szFormat + sizeof (szFormat) ;
         
         

       
       
        
         
       
       
     va_start (pArgList, szFormat) ;
         
         

       
       
        
         
       
       
          // The last argument to wvsprintf points to the arguments
         
         

       
       
        
         
       
       
     _vsntprintf (szBuffer, sizeof (szBuffer) / sizeof (TCHAR), 
         
         
                  szFormat, pArgList) ;
         
         

       
       
        
         
       
       
          // The va_end macro just zeroes out pArgList for no good reason
         
         

       
       
        
         
       
       
     va_end (pArgList) ;
         
         

       
       
        
         
       
       
     return MessageBox (NULL, szBuffer, szCaption, 0) ;
         
         
}
         
         

       
       
        
         
       
       
int WINAPI WinMain (HINSTANCE hInstance, HINSTANCE hPrevInstance,
         
         
                    PSTR szCmdLine, int iCmdShow) 
         
         
{
         
         
     int cxScreen, cyScreen ;
         
         

       
       
        
         
       
       
     cxScreen = GetSystemMetrics (SM_CXSCREEN) ;
         
         
     cyScreen = GetSystemMetrics (SM_CYSCREEN) ;
         
         
     MessageBoxPrintf (TEXT ("ScrnSize"), 
         
         
                       TEXT ("The screen is %i pixels wide by %i pixels high."),
         
         
                       cxScreen, cyScreen) ;
         
         
     return 0 ;
         
         
}

      
      
       
       
      
      
 
### 回答1: 英国文学的发展可以追溯到早期的古典文学,其中包括《伊利亚特》(The Illiad)和《奥德赛》(The Odyssey),这些古典文学对英国文学发展产生了重大影响。中世纪时期,英国文学受到基督教文化的影响,出现了众多宗教文学作品,例如《圣经》(Bible)和《诗篇》(Psalms)等。15世纪,英国文学出现了新的发展,出现了众多的抒情诗,例如乔叟的《爱情诗》(The Canterbury Tales)和莎士比亚的《哈姆雷特》(Hamlet)等。17世纪,英国文学发展进入鼎盛时期,出现了众多优秀的作家,例如莎士比亚、莱昂纳多·狄波拉(Leonardo da Vinci)和乔纳森·斯威夫特(Jonathan Swift)等。18世纪,英国文学的发展进入了浪漫主义时期,出现了大量优秀的浪漫主义作家,如弥尔顿(Milton)、华兹华斯(Wordsworth)和雪莱(Shelley)等。19世纪,英国文学发展进入了现代主义时期,出现了大量优秀的现代主义作家,如海明威(Hemingway)、萨克雷(Saroyan)和萨特(Sartre)等。20世纪,英国文学发展进入了现当代时期,出现了大量优秀的作家,如萨克雷(Saroyan)、福克纳(Faulkner)和泰勒(Taylor)等。 ### 回答2: 英国文学是世界文学史上的一颗璀璨明珠,经过数百年的发展和演变,成为国际文学领域的重要组成部分。本文将介绍英国文学的发展历程以及其中的代表作品和作家。 英国文学的起源可以追溯到中世纪,当时的作品大多以韵文形式表达,如《贝奥武夫》和《绿袍骑士》等。随着文艺复兴的到来,英国文学迎来了重大的突破。莎士比亚作为英国文学的代表人物之一,创作了众多著名的戏剧作品,如《哈姆雷特》和《罗密欧与朱丽叶》,成为世界文学宝库中的瑰宝。 17世纪的英国文学又经历了一个重要的发展阶段,即文艺复兴时期。这个时期的代表性作品包括约翰·明顿的诗歌作品《失乐园》和约翰·邦奇的散文作品《阐述论》。这两部作品不仅在当时引起了巨大的震撼,而且对后来的英国文学产生了深远的影响。 18世纪的英国文学被称为启蒙时代,这是一个思想兴起的时期。代表作包括乔纳森·斯威夫特的《格列佛游记》和塞缪尔·理查森的小说《克拉丽莎》。这些作品通过对社会问题的揭示和批评,为人类提供了新的思考和思考。 19世纪是英国文学的黄金时代,也是许多经典作品的诞生之年。查尔斯·狄更斯的《呼啸山庄》、简·奥斯汀的《傲慢与偏见》以及夏洛蒂·勃朗特的《简·爱》等,都成为世界文学史上的经典之作。这些作品展现了不同阶层和性格的人物,描绘了当时社会的真实面貌。 20世纪的英国文学经历了现代主义和后现代主义的浪潮。代表作品包括维吉尼亚·伍尔夫的《到灯火阑珊处去》和乔治·奥威尔的《1984》。这些作品体现了对传统文体和社会观念的颠覆和质疑。 可以说,英国文学是世界文学发展中的重要一环,其作品和作家深深地影响了世界各地的文学产出。通过对英国文学的研究和欣赏,我们可以更好地了解人性、社会和文化的多样性,同时也能体验到丰富多彩的文学魅力。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值