最顶层的接口是最抽象的Model抽象,代码中给出了定义,模型是数据的概率分布
模型具有的功能是返回某个观察值的概率,(带权)吸收观察值,计算模型参数,统计观察值数量,返回采样
public interface Model<O> extends Writable {
double pdf(O x);
void observe(O x);
void observe(O x, double weight);
void computeParameters();
long count();
Model<VectorWritable> sampleFromPosterior();
}
聚类Cluster是模型的一种,可以返回id、中心、半径、数量、生成易读字符串描述
public interface Cluster extends Model<VectorWritable>, Parametered {
int getId();
Vector getCenter();
Vector getRadius();
long getNumPoints();
String asFormatString(String[] bindings);
}
AbstractCluster抽象类实现接口,所有的构造函数是protected,意味着不能直接实例化,只能实例化子类,类名称Abstract反映出了此种特性
public abstract class AbstractCluster implements Cluster
由于本质还是Writable的,实现了读写方法
@Override
public void readFields(DataInput in) throws IOException {
this.id = in.readInt();
this.numPoints = in.readLong();
VectorWritable temp = new VectorWritable();
temp.readFields(in);
this.center = temp.get();
temp.readFields(in);
this.radius = temp.get();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(id);
out.writeLong(numPoints);
VectorWritable.writeVector(out, center);
VectorWritable.writeVector(out, radius);
}
java中抽象类可以不必实现接口的所有方法,有一些方法没有实现,比如pdf方法
DistanceMeasureCluster实现类,扩展了度量方法,聚类的度量方法是多样的,用ClassLoader加载度量类类
public class DistanceMeasureCluster extends AbstractCluster
非抽象类必须实现接口全部函数,因此实现了先前没实现过的pdf函数
在实现读写函数时,用了super方法,在原基础上追加内容
@Override
public void readFields(DataInput in) throws IOException {
String dm = in.readUTF();
try {
ClassLoader ccl = Thread.currentThread().getContextClassLoader();
this.measure = ccl.loadClass(dm).asSubclass(DistanceMeasure.class).newInstance();
} catch (InstantiationException e) {
throw new IllegalStateException(e);
} catch (IllegalAccessException e) {
throw new IllegalStateException(e);
} catch (ClassNotFoundException e) {
throw new IllegalStateException(e);
}
super.readFields(in);
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(measure.getClass().getName());
super.write(out);
}
具体到kmeans聚类下,又扩展了判断收敛的函数,相应的读写也用super方法追加新内容
public class Cluster extends DistanceMeasureCluster
@Override
public void write(DataOutput out) throws IOException {
super.write(out);
out.writeBoolean(converged);
}
@Override
public void readFields(DataInput in) throws IOException {
super.readFields(in);
this.converged = in.readBoolean();
}
以上可以看出从抽象到具体,从基础功能逐步扩展新功能的一套完整的体系~