机器学习之AdaBoosting：日撸Java三百行day63-65-优快云博客

本文链接：https://blog.youkuaiyun.com/weixin_49592304/article/details/124725156

1、什么是AdaBoosting

AdaBoosting全称Adaptive Boosting，中文译为适应性集成或自适应增强，是一种迭代提升算法。它是由Yoav Freund与Robert Schapire与1997年制作的机器学习元算法，他们也因此而获得2003年哥德尔奖。AdaBoosting算法是Boosting算法中的一个著名代表，而说到Boosting，又不得不提到集成学习（ensemble learning）。

·集成学习

集成学习通过构建并结合多个学习器来完成学习任务，有时也被称为多分类器系统（multi-classifier system）、基于委员会的学习（committee-based learning）。集成学习的一般结构如图所示：

在这里插入图片描述
集成学习的过程为：先产生一组个体学习器（individual learner），再用某种策略将它们结合起来。个体学习器通常由一个现有的学习算法从训练数据产生。当集成中只包含同种类型的个体学习器时，称这样的集成为同质（homogeneous）的；当然，集成也可以包含不同类型的个体学习器，称这样的集成为异质的（heterogeneous）的。
个体学习器的生成方式主要分为两种：1、学习器间存在依赖关系，必须串行生成序列化方法；2、学习器不存在依赖关系，可以同时生成并行化方法。按照不同的学习器生成方式，可以将集成学习方法分为两大类：1、Boosting；2、Bagging与随机森林（Random Forest）。本文讨论的AdaBoosting就是Boosting算法族中的一个代表。

·AdaBoosting

AdaBoosting算法原理可以用一句话简单概况：人多力量大。算法通过针对同一个数据集训练若干个分类器，然后将这些弱分类器（准确率稍高于50%的分类器）集合起来，构成一个更强的最终分类器。
算法本身是通过使用第一个分类器对初始数据集进行分类。第一次分类结束后，参考分类的准确度来改变数据的权值，将修改过权值的新的训练集送给下层分类器继续进行训练。最后将每次训练得到的分类器融合起来，作为最后的决策分类器。
AdaBoost系列算法主要解决了: 两类问题、多类单标签问题、多类多标签问题、大类单标签问题、回归问题。

2、AdaBoosting算法思想

AdaBoosting的算法思想就是不断迭代训练弱分类器，而是经过不停的考验和筛选来挑选出“精英”，直到分类正确率达到某个阈值时停止迭代。弱分类器的训练依赖于数据的权重，即：有的放矢。在一些判断错误的数据上多下功夫，而对于判断正确的数据就不用过于重视了。每一轮迭代的样本权重都不相同。最终使用所有分类器进行分类，并使用它们的加权和。给“精英”更多的投票权，表现不好的基础模型则给较少的投票权，然后综合所有人的投票得到最终结果。
在这里插入图片描述
从示意图中我们可以看出，在Adaboosting算法中，基分类器的训练及样本权值的调整是串行的；不同分类器的集成是并行的。

对于数据（实例）而言，分类或预测错误后，为了能够提高准确度，数据（实例）的权值会增加；但对于基分类器来说，越是分类准确性高的基分类器，它的权值越高。

令训练数据集 $\left \{ (x_{1},y_{1}),(x_{2},y_{2}),...,(x_{N},y_{N}) \right \}$ ，其中 $x_{i}$ 是一个含有 $d$ 个元素的列向量， $y_{i}$ 是标签且 $y∈\left \{ -1,+1 \right \}$ 代表着实例预测的对错。参考原文。AdaBoosting算法的具体步骤如下：

1、初始化样本的权重

为了公平起见，初始权值为样本个数的倒数，为每一个实例分配同样的权值。同时所有样本权值之和为1。
$D_{1}=\left \{ w_{11},w_{12},...,w_{1N},w_{1i}=\frac{1}{N},i=1,2,...,N \right \} \tag{1}$

2、按照样本权值分布权重 $D_{m}$ 训练数据得到第m个基学习器 $G_{m}(x)$ ， $m = 1, 2, . . ., M$ ；

3、计算 $G_{m}(x)$ 在训练数据集上的分类误差率：

$e_{m}=\sum_{i=1}^{N}P(G_{m}(x_{i})≠y_{i})=\sum_{i=1}^{N}w_{mi}I(G_{m}(x_{i})≠y_{i}) \tag{2}$
即：统计所有预测错误的数据量，除以总数据量后得到一个误差率。上式中 $I (\cdot)$ 是指示函数。

4、计算 $G_{m}(x)$ 的系数，即该基学习器的权值：

$\alpha _{m}=\frac{1}{2} \log_{}{\frac{1-e_{m}}{e_{m}} } \tag{3}$

5、更新训练样本的权值：

$D_{m+1}=(w_{m+1,1},w_{m+1,2},...,w_{m+1,N})\tag{4}$
其中， $w_{m+1,i}=\frac{w_{mi}}{Z_{m}}\exp(-\alpha _{m}y_{i}G_{m}(x_{i}))$ ，而 $Z_{m}$ 是规范化因子，目的是使所有样本权值之和为1。 $Z_{m}$ 的表达式为：
$Z_{m}=\sum_{i=1}^{N}w_{mi}\exp (-\alpha _{m}y_{i}G_{m}(x_{i})) \tag{5}$

6、判断是否达到终止条件，若分类准确度达到100%，那么终止训练；否则返回步骤2。

7、构建最终的分类器线性组合：

$f(x)=\sum_{i=1}^{M}\alpha _{m}G_{m}(x) \tag{6}$
得到最终的分类器为：
$G(x)=sign(f(x))=sign(\sum_{i=1}^{M}\alpha _{m}G_{m}(x) )\tag{7}$

3、算法的基本流程及操作

1、带权实例处理：

首先介绍实例的权值处理操作。每一次训练结束，都会对训练数据集的权值进行调整：若预测错误，则该实例权值提高；若预测正确，该实例权值降低。数据集所有数据权值之和为1，即归一化。

public class WeightedInstances extends Instances{//带权实例
	
	private static final long serialVersionUID=11087456L;
	private double[] weights;//权值
	
	public WeightedInstances(FileReader paraFileReader)throws Exception {
		super(paraFileReader);
	    setClassIndex(numAttributes()-1);//除了最后一个决策属性，其他都作为属性
	    //初始化权值
	    weights=new double[numInstances()];//权值为double类型的数组，每个实例有一个权值
	    double tempAverage=1.0/numInstances();//每一个实例的初始权值相同，所有实例的权值相加为1
	    for(int i=0;i<weights.length;i++) {
	    	weights[i]=tempAverage;//赋值给每一个实例
	    }
	    System.out.println("Instances weight are: " + Arrays.toString(weights));//输出实例权值
	}
	
	public WeightedInstances(Instances paraInstances) {
		super(paraInstances);
		setClassIndex(numAttributes()-1);//将最后一个属性作为决策属性
		//初始化权值
	    weights=new double[numInstances()];//权值为double类型的数组，每个实例有一个权值
	    double tempAverage=1.0/numInstances();//每一个实例的初始权值相同，所有实例的权值相加为1
	    for(int i=0;i<weights.length;i++) {
	    	weights[i]=tempAverage;//赋值给每一个实例
	    }
	    System.out.println("Instances weight are: " + Arrays.toString(weights));//输出实例权值
	}
	public double getWeight(int paraIndex) {
		return weights[paraIndex];
	}
	public void adjustWeights(boolean[] paraCorrectArray, double paraAlpha) {//每一轮结束后，调整实例的权值
		//第一步，计算Alpha
		double tempIncrese=Math.exp(paraAlpha);
		
		//第二步，调整
		double tempWeightsSum=0;
		for(int i=0;i<weights.length;i++) {
			if(paraCorrectArray[i]) {//若上一轮预测正确
				weights[i]/=tempIncrese;//降低其权值
			}else {//否词
				weights[i]*=tempIncrese;//提高其权值
			}
			tempWeightsSum+=weights[i];//权值累加，为归一化作准备
		}//of for i
		
		//第三步，归一化
		for(int i=0;i<weights.length;i++) {
			weights[i]/=tempWeightsSum;//使得所有实例的权值相加为1
		}//of for i
		System.out.println("After adjusting, Instances weights are: " + Arrays.toString(weights));//输出调整后的权值
	}//of adjustWeights
	public void adjustWeightsTest() {//权值调整方法测试
		boolean[] tempCorrectArray=new boolean[numInstances()];
		for(int i=0;i<tempCorrectArray.length/2;i++) {
			tempCorrectArray[i]=true;//一半都设为正确
		}//of for i
		double tempWeightedError=0.3;
		adjustWeights(tempCorrectArray, tempWeightedError);
		System.out.println("After adjusting");
		System.out.println(toString());
	}
	public String toString() {
		String resultString="I am a weighted Instances object.\r\n" + "I have " + numInstances() + " instances and "
				+ (numAttributes()-1) + " conditional attributes.\r\n" + "My weights are: " + Arrays.toString(weights)
				+ "\r\n" + "My data are:\r\n" + super.toString();
		return resultString;
	}
	public static void main(String args[]) {
		WeightedInstances tempWeightedInstances=null;//初始化带权实例
		String tempFilename="D:/software/eclipse/eclipse-workspace/day51/iris.arff";//读入数据文件地址
		try {
			FileReader fileReader=new FileReader(tempFilename);//数据集导入
			tempWeightedInstances=new WeightedInstances(fileReader);//创建带权实例
			fileReader.close();//关闭文件
		}catch (Exception ee) {
			System.out.println("Cannot read the file: " +  tempFilename);
			// TODO: handle exception
		}
		System.out.println(tempWeightedInstances.toString());
		tempWeightedInstances.adjustWeightsTest();
	}
}

2、使用带权实例，构建简易分类器：

简易分类器的思想十分简单，即：训练分类器，并计算分类器中训练正确率，来调整分类器权值。主要分为以下几个步骤：

2.1、创建SimpleClassifier抽象类，初始化变量，并读取数据：

int selectedAttribute;//当前选出的属性
	WeightedInstances weightedInstances;//带权实例
	double trainingAccuracy;//训练集准确度
	int numClasses;//分类类型数目
	int numInstances;//实例数目
	int numConditions;//条件属性数目
	Random random=new Random();//随机数
	public SimpleClassifier(WeightedInstances paraWeightedInstances) {//简单树桩分类器
		weightedInstances=paraWeightedInstances;//带权实例导入
		numConditions=weightedInstances.numAttributes()-1;//除了分类属性，其他都为条件属性
		numInstances=weightedInstances.numInstances();//实例数目
		numClasses=weightedInstances.classAttribute().numValues();//分类类型数目
	}//of first constructor
	public abstract void train();//训练函数声明
	public abstract int classify(Instance paraInstance);//分类函数声明
	public boolean[] computeCorrectnessArray() {//计算预测矩阵
		boolean[] resultCorrectnessArray=new boolean[weightedInstances.numInstances()];//预测成功的矩阵，大小为实例数目
		for(int i=0;i<resultCorrectnessArray.length;i++) {
			Instance tempInstance=weightedInstances.instance(i);//取出一个实例
			if((int)(tempInstance.classValue())==classify(tempInstance)) {//若属性值与预测值一致
				resultCorrectnessArray[i]=true;
			}//of if
		}//of for i
		return resultCorrectnessArray;
	}

2.2、计算分类器在训练过程中的正确率以及错误率：

	public double computeTrainingAccuracy() {//统计训练集的预测精度
		double tempCorrect=0;//记录预测成功的数量
		boolean[] tempCorrectnessArray=computeCorrectnessArray();//获取预测的矩阵
		for(int i=0;i<tempCorrectnessArray.length;i++) {
			if(tempCorrectnessArray[i]) {//如果预测成功了
				tempCorrect++;//成功数量加一
			}//of if
		}//of for i
		double resultAccuracy=tempCorrect/tempCorrectnessArray.length;//成功数量除以总数量等于精度
		return resultAccuracy;
	}
	public double computeWeightedError() {
		double resultError=0;
		boolean[] tempCorrectnessArray=computeCorrectnessArray();//获取预测的矩阵
		for(int i=0;i<tempCorrectnessArray.length;i++) {
			if(!tempCorrectnessArray[i]) {//若预测失败
				resultError+=weightedInstances.getWeight(i);//获取该实例的权值
			}//of if
		}//of for i
		if(resultError<1e-6) {//若小于阈值
			resultError=1e-6;//赋值
		}
		return resultError;//返回
	}

3、使用带权实例，构建树桩分类器：

3.1、创建StumpClassifier类，继承SimpleClassifier类。初始化变量，并读取数据：

double bestCut;//最佳分割位置
	int leftLeaflabel;//左叶子标签
	int rightLeaflabel;//右叶子标签
	
	public StumpClassifier(WeightedInstances paraWeightedInstances) {
		super(paraWeightedInstances);
	}

3.2、函数重载，实现训练与分类方法：

public void train() {
		//第一步 随机选择一个属性
		selectedAttribute=random.nextInt(numConditions);//随机选择一个条件属性
		//第二步 找到该属性的所有属性值，并排序
		double[] tempValuesArray=new double[numInstances];//创建条件属性数组来存储属性值
		for(int i=0;i<tempValuesArray.length;i++) {
			tempValuesArray[i]=weightedInstances.instance(i).value(selectedAttribute);
		}
		Arrays.sort(tempValuesArray);//将矩阵中元素排序
		//第三步 初始化，将所有实例进行分类
		int tempNumLabels=numClasses;//标签数目
		double[] tempLabelCountArray=new double[tempNumLabels];//????
		int tempCurrentLabel;//????
		
		//3.1 扫描所有标签 获取它们的值
		for(int i=0;i<numInstances;i++) {
			tempCurrentLabel=(int)weightedInstances.instance(i).classValue();//获取第i个实例的属性值
		    tempLabelCountArray[tempCurrentLabel]+=weightedInstances.getWeight(i);//加入记录数组中
		}//of for i
		
		//3.2找到带有最大值的标签
		double tempMaxCorrect=0;//初始化最大值标签记录变量值
		int tempBestLabel=-1;//初始化最大值标签记录变量位置
		for(int i=0;i<tempLabelCountArray.length;i++) {
			if(tempMaxCorrect<tempLabelCountArray[i]) {//如果找到值更大的标签
				tempMaxCorrect=tempLabelCountArray[i];//更新最大标签值
				tempBestLabel=i;//记录当前最佳标签
			}//of if
		}//of for i
		
		//3.3划分位置，要比最小值再小一些
		bestCut=tempValuesArray[0]-0.1;
		leftLeaflabel=tempBestLabel;
		rightLeaflabel=tempBestLabel;
		
		//第四步 逐个计算划分位置
		double tempCut;
		double[][] tempLabelCountMatrix=new double[2][tempNumLabels];
		
		for(int i=0;i<tempValuesArray.length-1;i++) {
			//4.1 一些特殊的属性值，与其相邻的属性值相同，忽略这些属性
			if(tempValuesArray[i]==tempValuesArray[i+1]) {
				continue;
			}//of if
			tempCut=(tempValuesArray[i]+tempValuesArray[i+1])/2;//取两值之间的中值作为划分 位置
			
			//4.2扫描所有标签并获取其标签值
			for(int j=0;j<2;j++) {//只有两种值：左边或右边，因此j<2
				for(int k=0;k<tempNumLabels;k++) {//每个实例都有其标签
					tempLabelCountMatrix[j][k]=0;//先用0将标签值矩阵填充
				}//of for k
			}//of for j
			
			for(int j=0;j<numInstances;j++) {
				//第j个实例的标签
				tempCurrentLabel=(int)weightedInstances.instance(j).classValue();
				if(weightedInstances.instance(j).value(selectedAttribute)<tempCut) {//若属性值小于划分位置
					tempLabelCountMatrix[0][tempCurrentLabel]+=weightedInstances.getWeight(j);//在左边
				}else {
					tempLabelCountMatrix[1][tempCurrentLabel]+=weightedInstances.getWeight(j);//在右边
				}//of if
			}//of for j
			
			//4.3处理左叶子
			double tempLeftMaxCorrect=0;//左边最大标签值
			int tempLeftBestLabel=0;//左边最佳标签值
			for(int j=0;j<tempLabelCountMatrix[0].length;j++) {//因为要找左边的，因此为[0].length
				if(tempLeftMaxCorrect<tempLabelCountMatrix[0][j]) {
					tempLeftBestLabel=j;//更新最佳标签
					tempLeftMaxCorrect=tempLabelCountMatrix[0][j];
				}//of if
			}//of for j
			
			//4.4处理右叶子，于4.3类似
			double tempRightMaxCorrect=0;
			int tempRightBestLabel=0;
			for(int j=0;j<tempLabelCountMatrix[1].length;j++) {
				if(tempRightMaxCorrect<tempLabelCountMatrix[1][j]) {
					tempRightBestLabel=j;//更新最佳标签
					tempRightMaxCorrect=tempLabelCountMatrix[1][j];
				}//of if
			}//of for j
			
			//4.5比较当前最佳属性
			if(tempMaxCorrect<tempLeftMaxCorrect+tempRightMaxCorrect) {
				tempMaxCorrect=tempLeftMaxCorrect+tempRightMaxCorrect;
				bestCut=tempCut;
				leftLeaflabel=tempLeftBestLabel;
				rightLeaflabel=tempRightBestLabel;
			}//of if
		}//of for i
		System.out.println("Attribute = " + selectedAttribute + ", cut = " + bestCut + ", leftLeafLabel = " 
				+ leftLeaflabel + ", rightLeafLabel = " + rightLeaflabel);
	}//of train

	@Override//函数重载
	public int classify(Instance paraInstance) {
		//按左大右小分类，若小于分割点，就在左边，否则在右边
		int resultLabel=-1;//初始化标签
		if(paraInstance.value(selectedAttribute)<bestCut) {//左边
			resultLabel=leftLeaflabel;
		}else {//右边
			resultLabel=rightLeaflabel;
		}
		return resultLabel;
	}//of classify

3.3、测试，并输出分类结果：

	public String toString() {//以字符串形式输出结果
		String resultString="I am a stump classifier.\r\n" + "I choose attribute " + selectedAttribute 
				+ " with cut value " + bestCut + "\r\n" + "The left and right leaf labels are " + leftLeaflabel
				+ " and " + rightLeaflabel + ", respectively.\r\n" + "My weighted error is: " + computeWeightedError()
				+ ".\r\n" + "My weighted accuracy is: " + computeTrainingAccuracy() + ".";
		return resultString;
	}

	public static void main(String args[]) {
		WeightedInstances tempWeightedInstances=null;//初始化带权实例
		String tempFilename="D:/software/eclipse/eclipse-workspace/day51/iris.arff";
		try {
			FileReader tempFileReader=new FileReader(tempFilename);//读取数据文件
			tempWeightedInstances=new WeightedInstances(tempFileReader);//生成数据
			tempFileReader.close();//关闭文件
		}catch (Exception ee) {
			System.out.println("Cannot read the file: " + tempFilename + "\r\n" +ee);
			System.exit(0);
			// TODO: handle exception
		}
		StumpClassifier tempClassifier=new StumpClassifier(tempWeightedInstances);//创建树桩训练器对象
		tempClassifier.train();//训练
		System.out.println(tempClassifier);//输出训练结果
		
		System.out.println(Arrays.toString(tempClassifier.computeCorrectnessArray()));//输出精度
	}

4、构建集成器：

4.1、初始化变量，读取数据，将最后一个属性作为决策属性，当正确率达到100%时停止训练：

	SimpleClassifier[] classifiers;//分类器组，用于集成
	int numClassifiers;//分类器数量
	boolean stopAfterConverge=false;//决定是否在分类成功率到达100%时停止分类
	double[] classifierWeights;//分类器权值，决定哪些分类器更重要一些
	Instances trainingData;//训练数据集
	Instances testingData;//测试数据集
	
	public Booster(String paraTrainingFilename) {
		//第一步获取训练集
		try {
			FileReader tempFileReader=new FileReader(paraTrainingFilename);//读取数据文件
			trainingData=new Instances(tempFileReader);//生成数据集，分成一个个实例
			tempFileReader.close();//关闭文件
		}catch (Exception ee) {
			System.out.println("Cannot read the file: " + paraTrainingFilename);
			System.exit(0);
			// TODO: handle exception
		}//of try
		
		//第二步 将最后一个属性作为分类属性
		trainingData.setClassIndex(trainingData.numAttributes()-1);//选择最后一个
		
		//第三步 设置测试集，在此将测试集亦设为训练集
		testingData=trainingData;
		
		stopAfterConverge=true;//当准确度达到100%时停止分类
		
		System.out.println("**********Data**********\r\n" + trainingData);
	}

4.2、设置基分类器数量，本实验的基分类器数量为100：

	public void setNumBaseClassifiers(int paraNumBaseClassifiers) {//设置基分类器
		numClassifiers=paraNumBaseClassifiers;
		
		//1、为分类器分配空间
		classifiers=new SimpleClassifier[numClassifiers];
		
		//2、初始化每个分类器权值数组
		classifierWeights=new double[numClassifiers];
	}

4.3、训练与分类。训练是通过准确率调整分类器权值进行；而分类是通过投票，将票数最多的标签作为最终预测结果：

	public void train() {
		//1、初始化
		WeightedInstances tempWeightedInstances=null;
		double tempError;
		numClassifiers=0;//分类器数量
		
		//2、构建分类器
		for(int i=0;i<classifiers.length;i++) {
			//2.1 生成带权实例或调整权重
			if(i==0) {//刚开始
				tempWeightedInstances=new WeightedInstances(trainingData);//输入训练数据生成带权实例
			}else {
				//调整实例权值
				tempWeightedInstances.adjustWeights(classifiers[i-1].computeCorrectnessArray(), classifierWeights[i-1]);//通过预测成功矩阵与上一次的权值来调整实例的权值
			}//of if
			
			//2.2 训练下一个分类器
			classifiers[i]=new StumpClassifier(tempWeightedInstances);//构建分类器
			classifiers[i].train();//对分类器进行训练
			//获取错误的实例权值
			tempError=classifiers[i].computeWeightedError();//计算alpha，用于调整权值
			//设置分类器权值
			classifierWeights[i]=0.5*Math.log(1/tempError-1);//计算权值
			if(classifierWeights[i]<1e-6) {//若小于阈值
				classifierWeights[i]=0;//变为0
			}//of if
			
			System.out.println("Classifier #" + i + ", weighted error = " + tempError + ", weight = " + classifierWeights[i] + "\r\n");//输出本次处理结果
		    numClassifiers++;//已处理分类器数目增加
		    
		    //当准确度到达阈值
		    if(stopAfterConverge) {
		    	double tempTrainingAccuracy=classifiers[i].computeTrainingAccuracy();//计算分类器准确度
		    	System.out.println("The accuracy of the booster is: " + tempTrainingAccuracy + "\r\n");
		    	if(tempTrainingAccuracy>0.999999) {
		    		System.out.println("Stop at the round " + i + " due to converge.\r\n");
		    		break;
		    	}//of if
		    }//of if
		}//of for i
	}//of train
	public int classify(Instance paraInstance) {
		int resultLabel=-1;
		double[] tempLabelsCountArray=new double[trainingData.classAttribute().numValues()];//记录有该标签的实例数量
		for(int i=0;i<numClassifiers;i++) {
			int tempLabel=classifiers[i].classify(paraInstance);//获取通过第i个分类获得的预测实例标签
			tempLabelsCountArray[tempLabel]+=classifierWeights[i];//添加进数组
		}//of for i
		double tempMax=-1;//找到最大值
		for(int i=0;i<tempLabelsCountArray.length;i++) {
			if(tempMax<tempLabelsCountArray[i]) {//若找到更大的
				tempMax=tempLabelsCountArray[i];//更新
				resultLabel=i;//更新
			}//of if
		}//of for i
		return resultLabel;
	}

4.4、统计正确率：统计预测正确的个数，并将该值除以总数据量，得到正确率：

	public double computeTrainingAccuracy() {//计算训练阶段分类器的正确率
		double tempCorrect=0;//统计正确个数
		for(int i=0;i<trainingData.numInstances();i++) {//每一个训练实例都要测试
			if(classify(trainingData.instance(i))==(int)trainingData.instance(i).classValue()) {//若预测值与实际属性值相同
				tempCorrect++;
			}//of if
		}//of for i
		double tempAccuracy=tempCorrect/trainingData.numInstances();//计算成功率
		return tempAccuracy;
	}

4.5、浅测一下，并输出结果：

	public double test() {//测试XX个实例数据
		System.out.println("Testing on " + testingData.numInstances() + " instances.\r\n");
		return test(testingData);
	}
	public double test(Instances paraInstances) {//利用分类器测试分类器的正确率
		double tempCorrect=0;//记录成功的个数
		paraInstances.setClassIndex(paraInstances.numAttributes()-1);//决策属性为最后一个
		for(int i=0;i<paraInstances.numInstances();i++) {//每一个测试实例都进行测试
			Instance tempInstance=paraInstances.instance(i);//生成实例
			if(classify(tempInstance)==(int)tempInstance.classValue()) {//若预测值与实际属性值相同
				tempCorrect++;
			}//of if
		}//of for i
		
		double resultAccuracy=tempCorrect/paraInstances.numInstances();//获取成功率
		System.out.println("The accuracy is: " + resultAccuracy);
		return resultAccuracy;
	}
		public static void main(String args[]) {
		System.out.println("Starting AdaBoosting...");
		Booster tempBooster=new Booster("D:/software/eclipse/eclipse-workspace/day51/iris.arff");//导入数据集。生成测试集与训练集
		tempBooster.setNumBaseClassifiers(100);//100个分类器
		tempBooster.train();//对100个分类器进行训练
		System.out.println("The training accuracy is: " + tempBooster.computeTrainingAccuracy());
		tempBooster.test();
	}//of main

4、一些问题

1、关于AdaBoosting的优缺点

AdaBoosting很好地利用了弱分类器进行级联，将若干个弱分类器集合起来，组成一个强分类器，或者是比强分类器更稳定的分类器，为设计算法带来了福音。同时，AdaBoosting也充分地考虑了每个分类器的权重，具有很高的精度。
但与此同时，弱分类器数目如何确定是AdaBoosting算法的一大缺陷。算法训练比较耗时。若预测正确率始终无法达到100%，那应该以什么作为终止条件呢。AdaBoosting算法的迭代次数也很难设定。若训练集数据不平衡，也会导致分类精度下降。

2、如何确定分类器数量\迭代次数

通过搜集相关资料了解到，设置迭代次数或分类器数量，需要参考两方面的因素：1、正确率；2、时间。
若不预设终止条件，那么可以利用分类正确率作为评判标准。若上一轮与本轮的正确率差值达到一定程度，就需要终止训练，将最佳正确率作为最后结果输出。但同时，这个差值的阈值是根据需要来设定的。
此外，可以使用交叉验证法来确定弱分类器数目。通过绘制正确率随着弱分类器数目变化曲线，来找到最适合的弱分类器数量。