【演化计算】【论文研读】Surrogate-Assisted Evolutionary DL Using E2E Random Forest-based Performance Predictor

本文提出了一种基于随机森林的端到端性能预测器，用于辅助进化深度学习算法，旨在减少计算资源消耗同时保持分类精度。该方法通过集成到现有的进化深度学习算法中，展现出优于18种前沿竞争对手的潜力。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Surrogate-Assisted Evolutionary Deep Learning Using an End-to-End Random Forest-based Performance Predictor

I. INTRODUCTION
II. LITERATURE REVIEW
III. PROPOSED ALGORITHM
IV. EXPERIMENT DESIGN
V. EXPERIMENTAL RESULTS
VI. CONCLUSIONS AND FUTURE WORK

the page dist: https://ieeexplore.ieee.org/document/8744404

😄 😆 😊 😃 😏 😍 😘 😚 😳 😌 😆 😁 😉 😜 😝 😀 😗 😙 😛 😴 😟 😦 😧 😮 😬 😕 😯 😑 😒 😅 😓 😥 😩 😔 😞 😖 😨 😰 😣 😢 😭 😂 😲 😱

I. INTRODUCTION

The proposed performance predictor shows promising performance in term of the classiﬁcation accuracy and the consumed computational resources when compared with 18 state-of-the-art peer competitors by integrating it into an existing EDL algorithm as a casestudy.

Unfortunately, designing architecture with the best performance for the investigated data requires extensive expertise in both the CNNs and the data domain [7], which is not necessarily held by the interested users.Designing the best CNN architecture for the given data can be seen as an optimization problem, which can be mathematically formulated by (1):
在这里插入图片描述
where λ refers to the parameters related to the architectures of CNNs, such as the number of convolutional layers and the conﬁgurations of pooling layers;
Λ refers to the parameter space, $A_λ$ denotes the CNN algorithm A adopting the architecture parameters λ, and L means the performance measure of $A_λ$ on the test data $D_{test}$ after it has been trained on the training data $D_{train}$ . Generally, λis with discrete values.

Our goal in this paper is to present an effective and efﬁcient end-to-end performance predictor (in short named E2EPP) based on random forest

II. LITERATURE REVIEW

In this section, the EDL, AE-CNN and random forest are ﬁrst introduced as the base algorithms in Subsection II-A, which is helpful to know the details of the proposed performance predictor.

A. Background

网络组件定义的介绍

The CNN architectures generated by AE-CNN are composed of the DenseNet Blocks (DBs), ResNet Blocks (RBs) and Pooling Blocks (PBs). Each DB or RB is composed of multiple DenseNet Units (DUs) and ResNet Units (RUs).

Respectively,while a PB consists of only one pooling layer. Each DU, RU, or PB differs in terms of the parameter settings.

The parameters of a DU are the sizes of input and output (denoted by in and out, respectively).
While those of an RU are the same as those of a DU, in addition to an increasing factor (denoted by k).
The parameter of a PB is only the pooling type (i.e., the maximal or mean pooling type) because its other parameters are all set to ﬁxed values.
Because a DB/RB is composed of multiple DUs/RUs, the parameters of a DB/RB are the corresponding parameters of a DU/RU and the amount (denoted by amount) of DUs/RUs in a DB/RB.
In addition, as the DBs, RBs and PBs compose a CNN with an order, an extra parameter (denoted by id) is also used to represent the corresponding position in the CNN.

Obviously, a CNN generated by AE-CNN is composed of sequential blocks which may be the DUs, RUs or PBs.

When the block is an RB, the parameters are the id, amount, in and out;
When the block is a DB, the parameters are id, amount, k, in and out;
When the block is a PB, the parameter is the pooling type denoted by type.

Noting that the in of the current RB/DB should be equal to the out of its previous RB/DB for the reason of constructing a valid CNN.

EDL

tournament selection crossover and mutation environmental selection

AE-CNN

Random Forest
As the performance predictor is part of EDL in this research, an EDL method should be provided before the performance predictor is detailed. In this work, the AECNN algorithm developed by the authors is selected as the representative EDL, which is mainly based on the reasons that: 1) it shows promising performance among existing EDLs [48], and 2) the source code of AE-CNN is available to the public. Noting that the proposed performance predictor is applicable to any existing EDLs

The AE-CNN algorithm [48] is an automatic EDL algorithm based on the building blocks of the state-of-the-art ResNet[49]andDenseNet[50].

B. Related Work（FBO 和 Peephole的优缺点）

The existing performance predictors can be classiﬁed into two different categories: performance predictors based on the learning curve and end-to-end performance predictors, both of which are based on the training-predicting computational paradigm.

The Peephole algorithm uses a number of CNN architectures and their corresponding performances that have been achieved by training the CNNs with T epochs, as the training samples to train a long-short time memory neural network [33]. The trained neural network directly predicts the performance of a new CNN based its architecture, which is called the end-to-end mechanism because the end in the input is the raw data while the end in the output is the classiﬁcation accuracy. Because the architecture cannot be directly used as the input data of the neural network, the Peephole algorithm employs the word vector technique to map the CNN architecture to a numerical value.
The major advantage of the FBO algorithm remains in the trained-CNN-free nature, i.e., it does not need any trained CNNs in advance. Because training a CNN is time-consuming, varying from several days to weeks, the FBO algorithm is efﬁcient. However, it will not be effective when the learning curve is not smooth because the curve ﬁtting works under the assumption that the curve is smooth. In recent deep learning applications, the learning curve is usually not smooth because a schedule of learning rates is usually used. Once the learning rate is changed, the learning curve will have a non-smooth segment.
Another limitation of the FBO algorithm is the nonend-to-end nature (i.e., in predicting the performance of each CNN, a part of the training data regarding this CNN must be collected for training the predictor), which requires much more labour work when it is used.
Owing to the end-to-end nature, the Peephole algorithm is more convenient for use. However, the major drawback of Peephole remains in the requirement of a large number of training samples, which results in added computational complexity of collecting train samples exceeding the EDL without using performance predictors.

For example, Peephole used over 8,000 fully trained CNN architectures as the training data. However, EDLs generally achieve promising performance by evaluating only hundreds of individuals. If we have enough computational resources to evaluate the 8,000 CNN architectures, we will not need to develop the performance predictor. Such a limitation is largely caused by its adopted regression mode, i.e., the neural network-based algorithm, which typically highly relies on a large amount of labelled training data.

III. PROPOSED ALGORITHM

在这里插入图片描述
As shown in Fig. 1 that is composed of three blocks, i.e., the data collection, E2EPP and EDL. The proposed E2EPP performance predictor is part of EDL.

Firstly, a set of training data is collected for training the random forest-based predictor, where the collection is achieved by performing the corresponding EDL without using E2EPP. Each data sample is composed of the CNN architecture and the corresponding classiﬁcation accuracy that is obtained by training the CNN from scratch.

Secondly, those architectures are encoded into discrete code (shown in Subsection III-A) for building the random forest-based predictor pool with a large number (say K) regression trees (denoted as CARTs) [51] (shown in Subsection III-B). During each generation of the EDL, the newly generated CNN architecture is encoded as the input to the random forest, and then its performance is predicted by using the adaptive combination of CARTs from the predictor pool (shown in Subsection III-C). When the EDL terminates, the CNN architecture that has the best prediction performance is output. Noting that there is no further CNN training during the optimization process.

A. Encoding

Based on the description shown in Subsection II-A2, the collected training data are summarized as below:

RBs and DBs: Each generated CNN architecture is composed of four RBs and four DBs at most, the number of output channels of each block varies between [32,512] that is set based on the conventions of state-of-the-art CNNs [50], [54]. （每个CNN最多由4RBs和4DBs组成，输出 channels 在[32,512]）
PBs: Each generated CNN architecture contains four pooling layers at most. There are two types of PBs: MAX and MEAN. （最多四个池化层，有max和mean方法）

Generally, we encode a CNN architecture into a chromosome with 3N_b +2N_p discrete variables, when the maximal number of RBs and DBs is N_b and the maximal number of PBs is N_p.

For the ﬁrst 3N_b variables, each RB or DB is encoded into a triplet as [type, out, amount], where the block type for RBs is set to 1, and that for DBs are set to 12, 20 and 40 when k is equal to 12, 20 and 40, respectively. Noting that the parameter of in for each RB/DB is not encoded because it can be calculated by the out of its previous RB/DB, and the smaller number of decision variables could result in a better performance for regression model when the training data is limited [46].
For the following 2N_p variables, each pooling layer is encoded into a pair as [pooling type, layer position], the maximal and mean pooling types are presented by 1 and 0, respectively.

If a CNN architecture has b RBs and DBs, and p PBs, then its 3b+1-th to 3N_b-th variables are set to zeros, and its 3N_b + 2p + 1-th to 3N_b + 2N_p-th variables are set to zeros as well. Therefore, the performance predictor by using random forest is based on the input data with 3Nb+2Np discrete decision variables, and the output is a continuous value within the range of [0,1]. Algorithm 2 shows the details of encoding a CNN architecture into the data that can be directly used by the random forest, and |·| is a countable operator.

Algorithm 2 shows the details of encoding a CNN architecture into the data that can be directly used by the random forest, and |·| is a countable operator.

B. Training of the Random Forest

A large number of CARTs are generated in the predictor pool. Each CART is trained by the whole training data with a random subset of features (i.e. discrete variables), where each discrete variable is assigned a probability of 0.5 in order to maximize the diversity of the predictor pool [55].

Each node of a CART presents a rectangle region in the decision space. The mean squared error of the output of those samples in that region (node) determines whether this node needs splitting or not (i.e., whether the mean squared error decrease is smaller than a set threshold Ts [51], and if so, this node is a leaf node). When K CARTs are obtained, the predictor pool is ready for the optimizer. The details of training the CARTs are shown in Algorithm 3.

C. Performance Prediction

To deal with the lack of training data, a large number of surrogate models are employed as ensemble members in a recent ofﬂine SAEA.

In each generation, all of the K trained CARTs re-estimated the performance on the CNN A^b that has the best-predicted ﬁtness value; and then Q out of the K CARTs are uniformly selected from the K ordered CARTs based on their prediction values on A^b. The Q CARTs are combined as the ensemble performance predictor to evaluate both parent and offspring population. Such selection is based on the performance diversity of CARTs around the current best CNN architecture A^b. After that, the generated CNN architectures A are evaluated by using the ensemble predictor of Q CARTs. Thus, the adaptive predictor can balance the global tendency and local information in the ﬁtness landscape, where the combination of K CARTs predicts the global average landscape and that of Q diverse CARTs in a small area reﬁnes the local landscape. The details of the prediction process in a generation are shown in Algorithm 4.

D. Strength and Weakness of E2EPP

As have been introduced in Subsection II-B, the limitations of the existing performance predictors are the non-end-to-end nature, the strict assumption on the smoothness of the learning curve, and the availability of the large training samples. The proposed method has been carefully designed to address these limitations.

The proposed algorithm is end-to-end and does not rely on the learning curve no matter whether it is smooth or not.

Firstly, the end-to-end nature is more convenient in practice because we do not need to prepare the training data in predicting the performance of each CNN.
Secondly, because the proposed algorithm does not need to ﬁt the learning curve, the predicted performance is better than the existing approaches based on the learning curve.

This is theoretically evidenced by the universal approximation theorem [56] that the learning curvebased approaches can achieve promising performance only when the learning curve is smooth. However, in practice, the learning curve is not always smooth.

IV. EXPERIMENT DESIGN

Two experiments are performed in this paper:

investigating the classiﬁcation performance of the proposed performance predictor with AE-CNN.
inspecting the efﬁciency of the proposed performance predictor.

In this section, the selected peer competitors and benchmark datasets, as well as the parameter settings for these two types of experiments, are detailed.

A. Peer Competitors

In comparing the classiﬁcation performance, the chosen peer competitors are divided into three different categories:

the state-of-the-art CNNs whose architectures are manually designed.
- The ﬁrst category covers DenseNet [50], ResNet [54], Maxout [57], VGG [58], Network in Network [59], Highway Network [60], All-CNN [61], FractaNet [62]. Considering the promising performance of ResNet.
  We use its two different versions: ResNet with the depths of 101 and 1,202, respectively. For the convenience of the discussion, they are denoted as ResNet (depth=101) and ResNet (depth=1,202), respectively.
the state-of-the-art CNN architecture designs based on non-evolutionary algorithms (mainly based on reinforcement learning).
- The second category consists of NAS [9], MetaQNN [8], EAS [9], and Block-QNN-S [10].
and the state-of-the-art EDL algorithms.
- The third category includes Genetic CNN [11], Large-scale Evolution [12], Hierarchical Evolution [13], and CGP-CNN [14].

Considering the proposed performance predictor is introduced in a case study of AE-CNN, AE-CNN combined with E2EPP (denoted as AE-CNN+E2EPP) is chosen to perform this experiment.