本文针对实现对一个给定目录下文本文件wordCount统计功能,假定文本中都是英文单词,且都用空格分隔,返回Map<String, Long>类型,key为word,value为count统计值。
不多说直接上代码:
/**
* 单词统计
*
* @param dirPath 文件目录
* @return
*/
public Map<String, Long> wordCount(String dirPath) {
Map<String, Long> map = new HashMap<>();
Pattern expression = Pattern.compile("[a-zA-Z]+");
List<String> fileList = new ArrayList<>();
getFileList(fileList, dirPath, ".txt");
for (int i = 0; i < fileList.size(); i++) {
String filePath = fileList.get(i);
String fileContent = null;
try {
fileContent = readFile(filePath);
} catch (IOException ex) {
}
String lowers = fileContent.toLowerCase();
Matcher matcher = expression.matcher(lowers);
String word = null;
while (matcher.find()) {
word = matcher.group();
if (map.containsKey(word)) {
map.put(word, map.get(word) + 1);
} else {
map.put(word, (long) 1);
}
}
}
return map;
}
/**
* 遍历文件夹取所有文件路径
*
* @param fileList 文件路径列表
* @param dirPath 文件夹路径
* @param fileType 文件类型
* @return
*/
private void getFileList(List<String> fileList, String dirPath, String fileType) {
File dir = new File(dirPath);
File[] files = dir.listFiles();
if (files != null) {
for (int i = 0; i < files.length; i++) {
String fileName = files[i].getName();
if (files[i].isDirectory()) {
getFileList(fileList, files[i].getAbsolutePath(), fileType);
} else if (fileName.endsWith(fileType)) {
String strFileName = files[i].getAbsolutePath();
fileList.add(strFileName);
} else {
continue;
}
}
}
}
/**
* 读取文件
*
* @param filePath 文件路径
* @return
* @throws IOException
*/
private String readFile(String filePath) throws IOException {
FileReader fis = new FileReader(filePath);
BufferedReader br = new BufferedReader(fis);
StringBuffer sb = new StringBuffer();
String it = br.readLine();
while (it != null) {
sb.append(it);
sb.append(" ");
it = br.readLine();
}
return sb.toString();
}
项目源码地址:
csdn:https://download.youkuaiyun.com/download/zhyzcl/12366312
github: https://github.com/276255322/WordCount.git
注:统计功能使用SpringBoot框架,你可以使用idea打开项目编译并运行,默认项目测试地址 http://127.0.0.1:8080/WordCount