jrae源码解析（二）

最新推荐文章于 2021-03-02 01:53:43 发布

转载最新推荐文章于 2021-03-02 01:53:43 发布 · 置顶 · 1.9k 阅读

本文细述上文引出的RAECost和SoftmaxCost两个类。

SoftmaxCost

我们已经知道，SoftmaxCost类在给定features和label的情况下（超参数给定），衡量给定权重（ hidden×catSize ）的误差值 cost ,并指出当前的权重梯度。看代码。

 
         @Override
        
         public 
         double 
         valueAt(
         double
         [] x) 
        
         {
        
         if
         ( !requiresEvaluation(x) )
        
         return 
         value;
        
         int 
         numDataItems = Features.columns;
        
         int
         [] requiredRows = ArraysHelper.makeArray(
         0
         , CatSize-
         2
         );
        
         ClassifierTheta Theta = 
         new 
         ClassifierTheta(x,FeatureLength,CatSize);
        
         DoubleMatrix Prediction = getPredictions (Theta, Features);
        
         double 
         MeanTerm = 
         1.0 
         / (
         double
         ) numDataItems;
        
         double 
         Cost = getLoss (Prediction, Labels).sum() * MeanTerm; 
        
         double 
         RegularisationTerm = 
         0.5 
         * Lambda * DoubleMatrixFunctions.SquaredNorm(Theta.W);
        
         DoubleMatrix Diff = Prediction.sub(Labels).muli(MeanTerm);
        
         DoubleMatrix Delta = Features.mmul(Diff.transpose());
        
         DoubleMatrix gradW = Delta.getColumns(requiredRows);
        
         DoubleMatrix gradb = ((Diff.rowSums()).getRows(requiredRows));
        
         //Regularizing. Bias does not have one.
        
         gradW = gradW.addi(Theta.W.mul(Lambda));
        
         Gradient = 
         new 
         ClassifierTheta(gradW,gradb);
        
         value = Cost + RegularisationTerm;
        
         gradient = Gradient.Theta;
        
         return 
         value; 
        
         }<br><br>
         public 
         DoubleMatrix getPredictions (ClassifierTheta Theta, DoubleMatrix Features)<br>    {<br>        
         int 
         numDataItems = Features.columns;<br>        DoubleMatrix Input = ((Theta.W.transpose()).mmul(Features)).addColumnVector(Theta.b);<br>        Input = DoubleMatrix.concatVertically(Input, DoubleMatrix.zeros(
         1
         ,numDataItems));<br>        
         return 
         Activation.valueAt(Input); <br>    }

是个典型的2层神经网络，没有隐层，首先根据features预测labels，预测结果用softmax归一化，然后根据误差反向传播算出权重梯度。

此处增加200字。

这个典型的2层神经网络，label为一列向量，目标label置1，其余为0；转换函数为softmax函数，输出为每个label的概率。

计算cost的函数为getLoss，假设目标label的预测输出为 p∗ ，则每个样本的cost也即误差函数为：

c o s t = E (p *) = - log (p *)

根据前述的神经网络后向传播算法，我们得到( j 为目标label时，否则为0)：

\partial E \partial w i j = \partial E \partial p j \partial h j \partial n e t j x i = - 1 p j p j (1 - p j) x i = - (1 - p j) x i = - (l a b e l j - p j) f e a t u r e i

因此我们便理解了下面代码的含义：

 
         DoubleMatrix Delta = Features.mmul(Diff.transpose());

RAECost

先看实现代码：

 
         @Override
        
         public 
         double 
         valueAt(
         double
         [] x)
        
         {
        
         if
         (!requiresEvaluation(x))
        
         return 
         value;
        
         Theta Theta1 = 
         new 
         Theta(x,hiddenSize,visibleSize,dictionaryLength);
        
         FineTunableTheta Theta2 = 
         new 
         FineTunableTheta(x,hiddenSize,visibleSize,catSize,dictionaryLength);
        
         Theta2.setWe( Theta2.We.add(WeOrig) );
        
         final 
         RAEClassificationCost classificationCost = 
         new 
         RAEClassificationCost(
        
         catSize, AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, Theta2);
        
         final 
         RAEFeatureCost featureCost = 
         new 
         RAEFeatureCost(
        
         AlphaCat, Beta, dictionaryLength, hiddenSize, Lambda, f, WeOrig, Theta1);
        
         Parallel.For(DataCell, 
        
         new 
         Parallel.Operation<LabeledDatum<Integer,Integer>>() {
        
         public 
         void 
         perform(
         int 
         index, LabeledDatum<Integer,Integer> Data)
        
         {
        
         try 
         {
        
         LabeledRAETree Tree = featureCost.Compute(Data);
        
         classificationCost.Compute(Data, Tree);                 
        
         } 
         catch 
         (Exception e) {
        
         System.err.println(e.getMessage());
        
         }
        
         }
        
         });
        
         double 
         costRAE = featureCost.getCost();
        
         double
         [] gradRAE = featureCost.getGradient().clone();
        
         double 
         costSUP = classificationCost.getCost();
        
         gradient = classificationCost.getGradient();
        
         value = costRAE + costSUP;
        
         for
         (
         int 
         i=
         0
         ; i<gradRAE.length; i++)
        
         gradient[i] += gradRAE[i];
        
         System.gc();    System.gc();
        
         System.gc();    System.gc();
        
         System.gc();    System.gc();
        
         System.gc();    System.gc();
        
         return 
         value;
        
         }

cost由两部分组成，featureCost和classificationCost。程序遍历每个样本，用featureCost.Compute(Data)生成一个递归树，同时累加cost和gradient，然后用classificationCost.Compute(Data, Tree)根据生成的树计算并累加cost和gradient。因此关键类为RAEFeatureCost和RAEClassificationCost。

RAEFeatureCost类在Compute函数中调用RAEPropagation的ForwardPropagate函数生成一棵树，然后调用BackPropagate计算梯度并累加。具体的算法过程，下一章分解。