判断中文的方法。

最新推荐文章于 2020-08-21 10:41:33 发布

原创最新推荐文章于 2020-08-21 10:41:33 发布 · 674 阅读

0 ·

CC 4.0 BY-SA版权

文章标签：

#正则表达式 #regex #string #文档

Java 专栏收录该内容

6 篇文章

订阅专栏

本文介绍了三种不同的汉字识别方法，包括使用正则表达式整体匹配、逐字符验证以及通过字节长度对比来判断是否存在汉字。这些方法适用于不同场景下的文本处理需求。

汉字的范围是：0x4e00 ~ 0x9fa5

正则表达式的一些规则在文档的Pattern类下有介绍

方法3个：

①。正则表达式:

String test = "中文";

String regEx = "[//u0x4e00-//u9fa5]";

Pattern pn = Pattern.compile(regEx);

Matcher mr = pn.matcher(test);

while(mr.find()){

System.out.println(mr.group()); //返回在以前匹配操作期间由给定组捕获的输入子序列

}

②.还是正则表达式，不过这次是以每一个字符进行正则表达式检测

string test = "Cinese 中文";

for(int i=0 ; i<test.length() ; i++){

string temp = test.subString(i , i+1);

if(temp.matches("[//u4e00-//u9fa5]"))

System.out.println(temp);

}

③.由于英文字母之类的是占用一个字节，而汉字占用2个字节，所以字符串的长度与字节个数的比较可得

System.out.println(str1.getBytes().length == str1.length()?"":str1);

//Method1 // String temp = Constants.wordString; // String regEx = "[//u4e00-//u9fa5]+"; // Pattern pattern = Pattern.compile(regEx); // Matcher matcher = pattern.matcher(temp); // while(matcher.find()){ // System.out.println(matcher.group()); //Method2 // String test = "Cinese 中文"; // for(int i=0 ; i<test.length() ; i++){ // String temp = test.substring(i , i+1); // if(temp.matches("[//u4e00-//u9fa5]")) // System.out.println(temp); // // } //Method3 // String test = "Cinese 中文 123 as 答"; // for(int i=0 ; i<test.length() ; i++){ // String temp = test.substring(i , i+1); // System.out.println(temp.getBytes().length == temp.length() ? "":temp);