01. 目的
累计文档中字母、数字、汉字、标点符号以及全部字符的数目。
注:文档中的空格、英文标点符号归为:“其他字符”
02. 主要方法
(1)用InputStreamReader读入文本内容,以行读入str = buf.readLine()
并判断一行中每个字符str.charAt(i);
(2)判断字母:str.charAt(i))>='A' && (str.charAt(i))<='Z') || ((str.charAt(i))>='a' && (str.charAt(i))<='z')
(3)判断数字:str.charAt(i)>='0' && str.charAt(i)<='9'
(4)判断汉字:str.charAt(i)>=0x4e00 && str.charAt(i)<=0x9fbb
判断中文字符:(包括中文标点符号)str.charAt(i)>=0x0391 && s.charAt(i)<=0xFFE5
此例中汉字和标点符号分别判断
(5)判断中文标点符号:
参考:Java判断中文符号 — Character.UnicodeBlock中的cjk说明
Character.UnicodeBlock pun = Character.UnicodeBlock.of(str.charAt(i)); //获取此字符的UniCodeBlock
if (pun == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS ||
pun == Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS ||
pun == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A ||
pun == Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_B ||
pun == Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION ||
pun == Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS ||
pun == Character.UnicodeBlock.GENERAL_PUNCTUATION)
Character.UnicodeBlock中cjk的说明:
- Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS : 4E00-9FBF:CJK 统一表意符号
- Character.UnicodeBlock.CJK_COMPATIBILITY_IDEOGRAPHS :F900-FAFF:CJK 兼容象形文字
- Character.UnicodeBlock.CJK_UNIFIED_IDEOGRAPHS_EXTENSION_A :3400-4DBF:CJK 统一表意符号扩展A
- Character.UnicodeBlock.GENERAL_PUNCTUATION :2000-206F:常用标点
- Character.UnicodeBlock.CJK_SYMBOLS_AND_PUNCTUATION :3000-303F:CJK 符号和标点
- Character.UnicodeBlock.HALFWIDTH_AND_FULLWIDTH_FORMS :FF00-FFEF:半角及全角形式
03. 程序代码 ( charCount.java )