今天女友让我统计一份数据,数据是一个txt的文件,里面每一行格式为“733026057 辽宁鞍山,上海上海,上海上海,”。需要的结果为“733026057;2; 上海上海[2];辽宁鞍山[1]"
以为自己能很快搞定,却半个小时才把代码搞定,那个⊙﹏⊙b汗呀!现在想想遇到的主要问题为:
(1)Java.io.*下的文件读写操作N久没写过了,上手慢了(原始文本编码方式为GB2312,必须处理编码)
(2)字符串处理有点耗时了,比如提取数字”733026057“,这个数字后面跟的不是空格,是使用”Tab“健生成的。最后使用模式匹配搞定。(这一块一种没仔细看,一定要抽空好好好学习一下 )
(3)Map做统计较好,但是按统计频率排序就不方便了,临时创建了一个MapSort,使用Collections.sort()完成排序。
特别说明:由于女友催的急,代码写的仓促,贴出来主要是提醒自己该提高了!java大侠们就忽略该文吧~
Pattern工具类:
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class PatternUtils {
// 截取数字
public static String getNumbers(String content) {
Pattern pattern = Pattern.compile("\\d+");
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
return matcher.group(0).trim();
}
return "";
}
// 截取非数字
public static String splitNotNumber(String content) {
Pattern pattern = Pattern.compile("\\D+");
Matcher matcher = pattern.matcher(content);
while (matcher.find()) {
return matcher.group(0).trim();
}
return "";
}
}
FileUtils:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStreamReader;
import java.io.OutputStreamWriter;
import java.io.Writer;
public class FileUtils {
public static BufferedReader getBufferedReader(String filePath, String charset) throws Exception {
// 文件字节流
FileInputStream fis = new FileInputStream(filePath);
// 字节流和字符流的桥梁,可以指定指定字符格式
InputStreamReader isr = new InputStreamReader(fis, charset);
// 将InputStreamReader 封装到缓冲流中,需要字符编码正确
BufferedReader br = new BufferedReader(isr);
return br;
}
public static BufferedWriter getBufferedWirtie(String filePath, String charset) throws Exception {
File file = new File(filePath);
if (!file.exists()) {
file.createNewFile();
}
Writer ops = new OutputStreamWriter(new FileOutputStream(file), charset);
BufferedWriter bw = new BufferedWriter(ops);
return bw;
}
}
MapSort:
public class MapSort implements Comparable<MapSort>{
private String key;
private Integer num;
public MapSort(String key, Integer num) {
this.key = key;
this.num = num == null ? 0 : num;
}
@Override
public int compareTo(MapSort o) {
return o.getNum()-this.num;
}
public String getKey() {
return key;
}
public void setKey(String key) {
this.key = key;
}
public Integer getNum() {
return num;
}
public void setNum(Integer num) {
this.num = num;
}
}
Main:
import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Set;
public class StaticMain {
public static void main(String[] args) throws Exception {
//Reads text from a character-input stream
BufferedReader br = FileUtils.getBufferedReader("E:\\juan.txt", "GB2312");
BufferedWriter bw = FileUtils.getBufferedWirtie("E:\\result.txt", "UTF-8");
String str = br.readLine();
while (str != null) {
// 读取ID及Address
str = str.trim();
String id = PatternUtils.getNumbers(str);
String addrStr = PatternUtils.splitNotNumber(str);
//統計地址
Map<String, Integer> addrMap = new HashMap<String, Integer>();
String[] addrArray = addrStr.split(",");
for (String addr : addrArray) {
addr = addr.trim();
if (addrStr.equals("")) {
continue;
}
Integer num = addrMap.get(addr);
if (num == null) {
addrMap.put(addr, 1);
} else {
addrMap.put(addr, num + 1);
}
}
//Map排序
List<MapSort> list = new ArrayList<MapSort>();
Set<String> keySet = addrMap.keySet();
for(String key:keySet) {
list.add(new MapSort(key, addrMap.get(key)));
}
Collections.sort(list);
//获取处理结果
String result = id + ";" + addrMap.size() + ";";
for (MapSort ad : list) {
result = result + ad.getKey() + "[" + ad.getNum() + "];";
}
System.out.println(result);
//写文件
bw.write(result);
bw.newLine();
str = br.readLine();
}
//应该使用在finally里
br.close();
bw.flush();
bw.close();
}
}
804

被折叠的 条评论
为什么被折叠?



