有一系列的字符串汉字和拼音,要按字母分组排序。
使用的工具:
字符类:
public class NameBean implements Comparable<NameBean> {
//汉字
private String nameGBk;
//拼音
private String namePY;
public NameBean() {
super();
}
public String getNameGBk() {
return nameGBk;
}
public void setNameGBk(String nameGBk) {
this.nameGBk = nameGBk;
}
public String getNamePY() {
return namePY;
}
public void setNamePY(String namePY) {
this.namePY = namePY;
}
@Override
public int compareTo(NameBean arg0) {
return getNamePY().compareTo(arg0.getNamePY());
}
@Override
public String toString() {
// TODO Auto-generated method stub
// return getNameGBk()+"="+getNamePY();
return getNameGBk() != null ? getNameGBk() : getNamePY();
}
}
测试:
public class TestMain {
// 判断首字符是否是汉字
public static boolean isChineseChar(String str) {
boolean temp = false;
Pattern p = Pattern.compile("[\u4e00-\u9fa5]");
Matcher m = p.matcher(str);
if (m.find()) {
temp = true;
}
return temp;
}
// 获取字符串的拼音
public static String getPinYin(String strs) {
HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();
format.setCaseType(HanyuPinyinCaseType.LOWERCASE);
format.setToneType(HanyuPinyinToneType.WITH_TONE_MARK);
format.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE);
char[] ch = strs.trim().toCharArray();
StringBuffer buffer = new StringBuffer("");
try {
for (int i = 0; i < ch.length; i++) {
if (Character.toString(ch[i]).matches("[\u4e00-\u9fa5]+")) {
String[] temp = PinyinHelper.toHanyuPinyinStringArray(
ch[i], format);
buffer.append(temp[0]);
buffer.append(" ");
} else {
buffer.append(Character.toString(ch[i]));
}
}
} catch (BadHanyuPinyinOutputFormatCombination e) {
e.printStackTrace();
}
return buffer.toString();
}
/**
* @param args
*/
public static void main(String[] args) {
// TODO Auto-generated method stub
String data[] = {
"水无月", "android", "杰克森", "news",
"baidu", "location", "oberser", "mary", "next", "ruby",
"money", "lucy", "very", "thunder", "object", "lily", "jay",
"answer", "layout", "demos", "com", "collect", "custom",
"blog", "round", "redirect", "ground", "gray", "blue", "zone",
"james", "zhang", "阿", "喔", "额", "哦", "不败", "坝堤", "布鲁斯", "da",
"大", "易", };
//关键字集合
TreeSet<String> treeData = new TreeSet<String>();
//数据集合
List<NameBean> nData = new LinkedList<NameBean>();
String str = "";
String let = "";
NameBean b;
// 从指定源数组中复制一个数组,复制从指定的位置开始,到目标数组的指定位置结束
for (int m = 0; m < data.length; m++) {
b = new NameBean();
// 取得字符串
str = data[m].trim();
// 取得字符串第一个字符
let = String.valueOf(str.charAt(0));
// 首字符是汉字
if (isChineseChar(let)) {
b.setNameGBk(str);
// 拼音化字符串
b.setNamePY(getPinYin(str));
// 取得拼音首字母
let = String.valueOf(getPinYin(let).charAt(0));
if (!treeData.contains(let)) {
// 首字母添加进关键字集合
treeData.add(let);
NameBean ins = new NameBean();
ins.setNamePY(let);
nData.add(ins);
}
}
// 首字母是拼音
else {
b.setNamePY(str);
if (!treeData.contains(let)) {
treeData.add(let);
NameBean ins = new NameBean();
ins.setNamePY(let);
nData.add(ins);
}
}
nData.add(b);
}
//升序排序
Collections.sort(nData);
for (int m = 0; m < nData.size(); m++) {
System.out.print(";");
System.out.print(nData.get(m));
}
}
}
结果:
;a;android;answer;b;baidu;blog;blue;坝堤;不败;布鲁斯;c;collect;com;custom;d;da;demos;大;g;gray;ground;j;james;jay;杰克森;l;layout;lily;location;lucy;m;mary;money;n;news;next;o;oberser;object;r;redirect;round;ruby;s;水无月;t;thunder;v;very;w;喔;y;易;z;zhang;zone;é;额;哦;ā;阿
大家可以看到,在结果中 ā 是不同于a的(类似情况也是),所以会显示出这种情况,你最好把元音音标都找出来做个标记进行纠错,那样就可以了,总共也没多少个。