在上一篇文章
多线程之K-近邻算法(二) 细粒度并发版本
中,简单的讲述了通过执行器来完成的K-近邻算法的细粒度版本,也许会有人想到这个版本的并发方案会存在一定的问题:执行的任务太多了,由于创建的执行器最大工作线程数为numThreads,因此,一个新的方案就是仅启动numThreads个任务,并将训练数据划分为numThreads个组去计算输入范例和对应组训练范例之间的距离
根据以上的设计思路,可以在之前KnnClassifierParallelIndividual算法的基础上进行修正,主要对classify方法进行修正,代码如下
import com.Knnclassifier.Distance;
import com.Knnclassifier.Sample;
import java.util.*;
import java.util.concurrent.CountDownLatch;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
public class KnnClassifierParallelGroup {
private final List<? extends Sample> dataSet;
private final int k;
private ThreadPoolExecutor executor;
private final int numThreads;
private final boolean parallelSort;
public KnnClassifierParallelGroup(List<? extends Sample> dataSet, int k, int factor, boolean parallelSort) {
this.dataSet = dataSet;
this.k = k;
this.numThreads = factor * (Runtime.getRuntime().availableProcessors());
this.executor = (ThreadPoolExecutor) Executors.newFixedThreadPool(numThreads);
this.parallelSort = parallelSort;
}
public String classify(Sample sample) throws Exception {
Distance[] distances = new Distance[dataSet.size()];
CountDownLatch endController = new CountDownLatch(numThreads);
int length = dataSet.size() / numThreads;
int startIndex = 0, endIndex = length;
for(int i=0; i<numThreads;i++) {
GroupDistanceTask task = new GroupDistanceTask(distances, startIndex, endIndex,
sample, dataSet, endController);
startIndex = endIndex;
if(i<numThreads -2) {
endIndex = endIndex + length;
} else {
endIndex = dataSet.size();
}
executor.execute(task);
}
endController.await();
if(parallelSort) {
Arrays.parallelSort(distances);
} else {
Arrays.sort(distances);
}
executor.shutdown();
Map<String, Integer> results = new HashMap<>();
for(int i = 0; i < k; i++) {
Sample localExample = dataSet.get(distances[i].getIndex());
String tag = localExample.getTag();
results.merge(tag, 1, (a,b) ->a+b);
}
return Collections.max(results.entrySet(),
Map.Entry.comparingByValue()).getKey();
}
}
同理需要修正计算任务GroupDistanceTask代码,修正如下
import com.Knnclassifier.Distance;
import com.Knnclassifier.EuclideanDistanceCalculator;
import com.Knnclassifier.Sample;
import java.util.List;
import java.util.concurrent.CountDownLatch;
public class GroupDistanceTask implements Runnable {
private final Distance[] distances;
private final int startIndex, endIndex;
private final Sample example;
private final List<? extends Sample> dataSet;
private final CountDownLatch endController;
public GroupDistanceTask(Distance[] distances, int startIndex,
int endIndex, Sample example,
List<? extends Sample> dataSet, CountDownLatch endController) {
this.distances = distances;
this.startIndex = startIndex;
this.endIndex = endIndex;
this.example = example;
this.dataSet = dataSet;
this.endController = endController;
}
@Override
public void run() {
for(int index = startIndex; index < endIndex; index++) {
Sample localExample = dataSet.get(index);
distances[index] = new Distance();
distances[index].setIndex(index);
distances[index].setDistance(EuclideanDistanceCalculator
.calculate(localExample,example));
}
endController.countDown();
}
}
启动类替换修改以下即可,在这里就不展示了,执行代码效果如下