打开网址http://inamidst.com/stuff/unidata/

可以查看unicode以及对应的字符:

点击选择一个字符后,会转到http://www.fileformat.info这个网址,这个网站上会显示该字符的
详细信息,包Unicode Data,Encodings,在html/c/c++/java/python 语言中的编码信息。
比如下面是美元符号的信息:
Unicode Data | |
---|---|
Name | DOLLAR SIGN |
Block | Basic Latin |
Category | Symbol, Currency [Sc] |
Combine | 0 |
BIDI | European Number Terminator [ET] |
Mirror | N |
Index entries | milreis DOLLAR SIGN escudo |
Comments | milreis, escudo glyph may have one or two vertical bars other currency symbol characters: U+20A0-U+20B8 |
See Also | currency sign U+00A4 heavy dollar sign U+1F4B2 |
Version | Unicode 1.1.0 (June, 1993) |
Encodings | |
---|---|
HTML Entity (decimal) | $ |
HTML Entity (hex) | $ |
How to type in Microsoft Windows | Alt +0024 Alt 036 Alt 36 |
UTF-8 (hex) | 0x24 (24) |
UTF-8 (binary) | 00100100 |
UTF-16 (hex) | 0x0024 (0024) |
UTF-16 (decimal) | 36 |
UTF-32 (hex) | 0x00000024 (0024) |
UTF-32 (decimal) | 36 |
C/C++/Java source code | "\u0024" |
Python source code | u"\u0024" |
More... |
Java Data | |
---|---|
string.toUpperCase() | $ |
string.toLowerCase() | $ |
Character.UnicodeBlock | BASIC_LATIN |
Character.charCount() | 1 |
Character.getDirectionality() | DIRECTIONALITY_EUROPEAN_NUMBER_TERMINATOR [5] |
Character.getNumericValue() | -1 |
Character.getType() | 26 |
Character.isDefined() | Yes |
Character.isDigit() | No |
Character.isIdentifierIgnorable() | No |
Character.isISOControl() | No |
Character.isJavaIdentifierPart() | Yes |
Character.isJavaIdentifierStart() | Yes |
Character.isLetter() | No |
Character.isLetterOrDigit() | No |
Character.isLowerCase() | No |
Character.isMirrored() | No |
Character.isSpaceChar() | No |
Character.isSupplementaryCodePoint() | No |
Character.isTitleCase() | No |
Character.isUnicodeIdentifierPart() | No |
Character.isUnicodeIdentifierStart() | No |
Character.isUpperCase() | No |
Character.isValidCodePoint() | Yes |
Character.isWhitespace() | No |
wiki 上code point的解释:
plane,
and 16 supplementary planes), each with 65,536 (= 2
16
) code points.
Thus the total size of the Unicode code space is 17 × 65,536 = 1,114,112.
在Python中,可以通过unicode name的取得相应的字符,如可以通过名字'dollar sign',
来得到dollar符号:
----------------------------------------------------------------------------------------------------------
>>> dollar = u"\N{dollar sign}"
>>> print dollar
$
>>> print dollar
$
----------------------------------------------------------------------------------------------------------