A Recipe for Training Neural Networks

本文深入探讨了深度学习训练过程中的常见误区与解决方案,提出了从理解数据到模型调优的六步训练策略,强调了避免一次性引入过多复杂度的重要性。

http://karpathy.github.io/2019/04/25/recipe/#2-set-up-the-end-to-end-trainingevaluation-skeleton--get-dumb-baselines

 

Some few weeks ago I posted a tweet on “the most common neural net mistakes”, listing a few common gotchas related to training neural nets. The tweet got quite a bit more engagement than I anticipated (including a webinar :)). Clearly, a lot of people have personally encountered the large gap between “here is how a convolutional layer works” and “our convnet achieves state of the art results”.

So I thought it could be fun to brush off my dusty blog to expand my tweet to the long form that this topic deserves. However, instead of going into an enumeration of more common errors or fleshing them out, I wanted to dig a bit deeper and talk about how one can avoid making these errors altogether (or fix them very fast). The trick to doing so is to follow a certain process, which as far as I can tell is not very often documented. Let’s start with two important observations that motivate it.

1) Neural net training is a leaky abstraction

It is allegedly easy to get started with training neural nets. Numerous libraries and frameworks take pride in displaying 30-line miracle snippets that solve your data problems, giving the (false) impression that this stuff is plug and play. It’s common see things like:

>>> your_data = # plug your awesome dataset here
>>> model = SuperCrossValidator(SuperDuper.fit, your_data, ResNet50, SGDOptimizer)
# conquer world here

These libraries and examples activate the part of our brain that is familiar with standard software - a place where clean APIs and abstractions are often attainable. Requests library to demonstrate:

>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200

That’s cool! A courageous developer has taken the burden of understanding query strings, urls, GET/POST requests, HTTP connections, and so on from you and largely hidden the complexity behind a few lines of code. This is what we are familiar with and expect. Unfortunately, neural nets are nothing like that. They are not “off-the-shelf” technology the second you deviate slightly from training an ImageNet classifier. I’ve tried to make this point in my post “Yes you should understand backprop” by picking on backpropagation and calling it a “leaky abstraction”, but the situation is unfortunately much more dire. Backprop + SGD does not magically make your network work. Batch norm does not magically make it converge faster. RNNs don’t magically let you “plug in” text. And just because you can formulate your problem as RL doesn’t mean you should. If you insist on using the technology without understanding how it works you are likely to fail. Which brings me to…

2) Neural net training fails silently

When you break or misconfigure code you will often get some kind of an exception. You plugged in an integer where something expected a string. The function only expected 3 arguments. This import failed. That key does not exist. The number of elements in the two lists isn’t equal. In addition, it’s often possible to create unit tests for a certain functionality.

This is just a start when it comes to training neural nets. Everything could be correct syntactically, but the whole thing isn’t arranged properly, and it’s really hard to tell. The “possible error surface” is large, logical (as opposed to syntactic), and very tricky to unit test. For example, perhaps you forgot to flip your labels when you left-right flipped the image during data augmentation. Your net can still (shockingly) work pretty well because your network can internally learn to detect flipped images and then it left-right flips its predictions. Or maybe your autoregressive model accidentally takes the thing it’s trying to predict as an input due to an off-by-one bug. Or you tried to clip your gradients but instead clipped the loss, causing the outlier examples to be ignored during training. Or you initialized your weights from a pretrained checkpoint but didn’t use the original mean. Or you just screwed up the settings for regularization strengths, learning rate, its decay rate, model size, etc. Therefore, your misconfigured neural net will throw exceptions only if you’re lucky; Most of the time it will train but silently work a bit worse.

As a result, (and this is reeaally difficult to over-emphasize) a “fast and furious” approach to training neural networks does not work and only leads to suffering. Now, suffering is a perfectly natural part of getting a neural network to work well, but it can be mitigated by being thorough, defensive, paranoid, and obsessed with visualizations of basically every possible thing. The qualities that in my experience correlate most strongly to success in deep learning are patience and attention to detail.

The recipe

In light of the above two facts, I have developed a specific process for myself that I follow when applying a neural net to a new problem, which I will try to describe. You will see that it takes the two principles above very seriously. In particular, it builds from simple to complex and at every step of the way we make concrete hypotheses about what will happen and then either validate them with an experiment or investigate until we find some issue. What we try to prevent very hard is the introduction of a lot of “unverified” complexity at once, which is bound to introduce bugs/misconfigurations that will take forever to find (if ever). If writing your neural net code was like training one, you’d want to use a very small learning rate and guess and then evaluate the full test set after every iteration.

1. Become one with the data

The first step to training a neural net is to not touch any neural net code at all and instead begin by thoroughly inspecting your data. This step is critical. I like to spend copious amount of time (measured in units of hours) scanning through thousands of examples, understanding their distribution and looking for patterns. Luckily, your brain is pretty good at this. One time I discovered that the data contained duplicate examples. Another time I found corrupted images / labels. I look for data imbalances and biases. I will typically also pay attention to my own process for classifying the data, which hints at the kinds of architectures we’ll eventually explore. As an example - are very local features enough or do we need global context? How much variation is there and what form does it take? What variation is spurious and could be preprocessed out? Does spatial position matter or do we want to average pool it out? How much does detail matter and how far could we afford to downsample the images? How noisy are the labels?

In addition, since the neural net is effectively a compressed/compiled version of your dataset, you’ll be able to look at your network (mis)predictions and understand where they might be coming from. And if your network is giving you some prediction that doesn’t seem consistent with what you’ve seen in the data, something is off.

Once you get a qualitative sense it is also a good idea to write some simple code to search/filter/sort by whatever you can think of (e.g. type of label, size of annotations, number of annotations, etc.) and visualize their distributions and the outliers along any axis. The outliers especially almost always uncover some bugs in data quality or preprocessing.

2. Set up the end-to-end training/evaluation skeleton + get dumb baselines

Now that we understand our data can we reach for our super fancy Multi-scale ASPP FPN ResNet and begin training awesome models? For sure no. That is the road to suffering. Our next step is to set up a full training + evaluation skeleton and gain trust in its correctness via a series of experiments. At this stage it is best to pick some simple model that you couldn’t possibly have screwed up somehow - e.g. a linear classifier, or a very tiny ConvNet. We’ll want to train it, visualize the losses, any other metrics (e.g. accuracy), model predictions, and perform a series of ablation experiments with explicit hypotheses along the way.

Tips & tricks for this stage:

  • fix random seed. Always use a fixed random seed to guarantee that when you run the code twice you will get the same outcome. This removes a factor of variation and will help keep you sane.
  • simplify. Make sure to disable any unnecessary fanciness. As an example, definitely turn off any data augmentation at this stage. Data augmentation is a regularization strategy that we may incorporate later, but for now it is just another opportunity to introduce some dumb bug.
  • add significant digits to your eval. When plotting the test loss run the evaluation over the entire (large) test set. Do not just plot test losses over batches and then rely on smoothing them in Tensorboard. We are in pursuit of correctness and are very willing to give up time for staying sane.
  • verify loss @ init. Verify that your loss starts at the correct loss value. E.g. if you initialize your final layer correctly you should measure -log(1/n_classes) on a softmax at initialization. The same default values can be derived for L2 regression, Huber losses, etc.
  • init well. Initialize the final layer weights correctly. E.g. if you are regressing some values that have a mean of 50 then initialize the final bias to 50. If you have an imbalanced dataset of a ratio 1:10 of positives:negatives, set the bias on your logits such that your network predicts probability of 0.1 at initialization. Setting these correctly will speed up convergence and eliminate “hockey stick” loss curves where in the first few iteration your network is basically just learning the bias.
  • human baseline. Monitor metrics other than loss that are human interpretable and checkable (e.g. accuracy). Whenever possible evaluate your own (human) accuracy and compare to it. Alternatively, annotate the test data twice and for each example treat one annotation as prediction and the second as ground truth.
  • input-indepent baseline. Train an input-independent baseline, (e.g. easiest is to just set all your inputs to zero). This should perform worse than when you actually plug in your data without zeroing it out. Does it? i.e. does your model learn to extract any information out of the input at all?
  • overfit one batch. Overfit a single batch of only a few examples (e.g. as little as two). To do so we increase the capacity of our model (e.g. add layers or filters) and verify that we can reach the lowest achievable loss (e.g. zero). I also like to visualize in the same plot both the label and the prediction and ensure that they end up aligning perfectly once we reach the minimum loss. If they do not, there is a bug somewhere and we cannot continue to the next stage.
  • verify decreasing training loss. At this stage you will hopefully be underfitting on your dataset because you’re working with a toy model. Try to increase its capacity just a bit. Did your training loss go down as it should?
  • visualize just before the net. The unambiguously correct place to visualize your data is immediately before your y_hat = model(x) (or sess.run in tf). That is - you want to visualize exactly what goes into your network, decoding that raw tensor of data and labels into visualizations. This is the only “source of truth”. I can’t count the number of times this has saved me and revealed problems in data preprocessing and augmentation.
  • visualize prediction dynamics. I like to visualize model predictions on a fixed test batch during the course of training. The “dynamics” of how these predictions move will give you incredibly good intuition for how the training progresses. Many times it is possible to feel the network “struggle” to fit your data if it wiggles too much in some way, revealing instabilities. Very low or very high learning rates are also easily noticeable in the amount of jitter.
  • use backprop to chart dependencies. Your deep learning code will often contain complicated, vectorized, and broadcasted operations. A relatively common bug I’ve come across a few times is that people get this wrong (e.g. they use view instead of transpose/permute somewhere) and inadvertently mix information across the batch dimension. It is a depressing fact that your network will typically still train okay because it will learn to ignore data from the other examples. One way to debug this (and other related problems) is to set the loss to be something trivial like the sum of all outputs of example i, run the backward pass all the way to the input, and ensure that you get a non-zero gradient only on the i-th input. The same strategy can be used to e.g. ensure that your autoregressive model at time t only depends on 1..t-1. More generally, gradients give you information about what depends on what in your network, which can be useful for debugging.
  • generalize a special case. This is a bit more of a general coding tip but I’ve often seen people create bugs when they bite off more than they can chew, writing a relatively general functionality from scratch. I like to write a very specific function to what I’m doing right now, get that to work, and then generalize it later making sure that I get the same result. Often this applies to vectorizing code, where I almost always write out the fully loopy version first and only then transform it to vectorized code one loop at a time.

3. Overfit

At this stage we should have a good understanding of the dataset and we have the full training + evaluation pipeline working. For any given model we can (reproducibly) compute a metric that we trust. We are also armed with our performance for an input-independent baseline, the performance of a few dumb baselines (we better beat these), and we have a rough sense of the performance of a human (we hope to reach this). The stage is now set for iterating on a good model.

The approach I like to take to finding a good model has two stages: first get a model large enough that it can overfit (i.e. focus on training loss) and then regularize it appropriately (give up some training loss to improve the validation loss). The reason I like these two stages is that if we are not able to reach a low error rate with any model at all that may again indicate some issues, bugs, or misconfiguration.

A few tips & tricks for this stage:

  • picking the model. To reach a good training loss you’ll want to choose an appropriate architecture for the data. When it comes to choosing this my #1 advice is: Don’t be a hero. I’ve seen a lot of people who are eager to get crazy and creative in stacking up the lego blocks of the neural net toolbox in various exotic architectures that make sense to them. Resist this temptation strongly in the early stages of your project. I always advise people to simply find the most related paper and copy paste their simplest architecture that achieves good performance. E.g. if you are classifying images don’t be a hero and just copy paste a ResNet-50 for your first run. You’re allowed to do something more custom later and beat this.
  • adam is safe. In the early stages of setting baselines I like to use Adam with a learning rate of 3e-4. In my experience Adam is much more forgiving to hyperparameters, including a bad learning rate. For ConvNets a well-tuned SGD will almost always slightly outperform Adam, but the optimal learning rate region is much more narrow and problem-specific. (Note: If you are using RNNs and related sequence models it is more common to use Adam. At the initial stage of your project, again, don’t be a hero and follow whatever the most related papers do.)
  • complexify only one at a time. If you have multiple signals to plug into your classifier I would advise that you plug them in one by one and every time ensure that you get a performance boost you’d expect. Don’t throw the kitchen sink at your model at the start. There are other ways of building up complexity - e.g. you can try to plug in smaller images first and make them bigger later, etc.
  • do not trust learning rate decay defaults. If you are re-purposing code from some other domain always be very careful with learning rate decay. Not only would you want to use different decay schedules for different problems, but - even worse - in a typical implementation the schedule will be based current epoch number, which can vary widely simply depending on the size of your dataset. E.g. ImageNet would decay by 10 on epoch 30. If you’re not training ImageNet then you almost certainly do not want this. If you’re not careful your code could secretely be driving your learning rate to zero too early, not allowing your model to converge. In my own work I always disable learning rate decays entirely (I use a constant LR) and tune this all the way at the very end.

4. Regularize

Ideally, we are now at a place where we have a large model that is fitting at least the training set. Now it is time to regularize it and gain some validation accuracy by giving up some of the training accuracy. Some tips & tricks:

  • get more data. First, the by far best and preferred way to regularize a model in any practical setting is to add more real training data. It is a very common mistake to spend a lot engineering cycles trying to squeeze juice out of a small dataset when you could instead be collecting more data. As far as I’m aware adding more data is pretty much the only guaranteed way to monotonically improve the performance of a well-configured neural network almost indefinitely. The other would be ensembles (if you can afford them), but that tops out after ~5 models.
  • data augment. The next best thing to real data is half-fake data - try out more aggressive data augmentation.
  • creative augmentation. If half-fake data doesn’t do it, fake data may also do something. People are finding creative ways of expanding datasets; For example, domain randomization, use of simulation, clever hybrids such as inserting (potentially simulated) data into scenes, or even GANs.
  • pretrain. It rarely ever hurts to use a pretrained network if you can, even if you have enough data.
  • stick with supervised learning. Do not get over-excited about unsupervised pretraining. Unlike what that blog post from 2008 tells you, as far as I know, no version of it has reported strong results in modern computer vision (though NLP seems to be doing pretty well with BERT and friends these days, quite likely owing to the more deliberate nature of text, and a higher signal to noise ratio).
  • smaller input dimensionality. Remove features that may contain spurious signal. Any added spurious input is just another opportunity to overfit if your dataset is small. Similarly, if low-level details don’t matter much try to input a smaller image.
  • smaller model size. In many cases you can use domain knowledge constraints on the network to decrease its size. As an example, it used to be trendy to use Fully Connected layers at the top of backbones for ImageNet but these have since been replaced with simple average pooling, eliminating a ton of parameters in the process.
  • decrease the batch size. Due to the normalization inside batch norm smaller batch sizes somewhat correspond to stronger regularization. This is because the batch empirical mean/std are more approximate versions of the full mean/std so the scale & offset “wiggles” your batch around more.
  • drop. Add dropout. Use dropout2d (spatial dropout) for ConvNets. Use this sparingly/carefully because dropout does not seem to play nice with batch normalization.
  • weight decay. Increase the weight decay penalty.
  • early stopping. Stop training based on your measured validation loss to catch your model just as it’s about to overfit.
  • try a larger model. I mention this last and only after early stopping but I’ve found a few times in the past that larger models will of course overfit much more eventually, but their “early stopped” performance can often be much better than that of smaller models.

Finally, to gain additional confidence that your network is a reasonable classifier, I like to visualize the network’s first-layer weights and ensure you get nice edges that make sense. If your first layer filters look like noise then something could be off. Similarly, activations inside the net can sometimes display odd artifacts and hint at problems.

5. Tune

You should now be “in the loop” with your dataset exploring a wide model space for architectures that achieve low validation loss. A few tips and tricks for this step:

  • random over grid search. For simultaneously tuning multiple hyperparameters it may sound tempting to use grid search to ensure coverage of all settings, but keep in mind that it is best to use random search instead. Intuitively, this is because neural nets are often much more sensitive to some parameters than others. In the limit, if a parameter a matters but changing b has no effect then you’d rather sample a more throughly than at a few fixed points multiple times.
  • hyper-parameter optimization. There is a large number of fancy bayesian hyper-parameter optimization toolboxes around and a few of my friends have also reported success with them, but my personal experience is that the state of the art approach to exploring a nice and wide space of models and hyperparameters is to use an intern :). Just kidding.

6. Squeeze out the juice

Once you find the best types of architectures and hyper-parameters you can still use a few more tricks to squeeze out the last pieces of juice out of the system:

  • ensembles. Model ensembles are a pretty much guaranteed way to gain 2% of accuracy on anything. If you can’t afford the computation at test time look into distilling your ensemble into a network using dark knowledge.
  • leave it training. I’ve often seen people tempted to stop the model training when the validation loss seems to be leveling off. In my experience networks keep training for unintuitively long time. One time I accidentally left a model training during the winter break and when I got back in January it was SOTA (“state of the art”).

Conclusion

Once you make it here you’ll have all the ingredients for success: You have a deep understanding of the technology, the dataset and the problem, you’ve set up the entire training/evaluation infrastructure and achieved high confidence in its accuracy, and you’ve explored increasingly more complex models, gaining performance improvements in ways you’ve predicted each step of the way. You’re now ready to read a lot of papers, try a large number of experiments, and get your SOTA results. Good luck!

Stage-based Neural Network for Reflow Profile Prediction and Reflow Recipe Optimization for Quality and Energy Saving Zhenxuan Zhang1, Yuanyuan Li1, Sang Won Yoon1, Daehan Won1* 1*System Science and Industry Engineering, State University of the New York at Binghamton, Binghamton, 13902, NY, United States. *Corresponding author(s). E-mail(s): dhwon@binghamton.edu; Contributing authors: zzhang98@binghamton.edu; yli352@binghamton.edu; yoons@binghamton.edu; Abstract During the reflow process, solder joints are formed on the boards with the placed components, so the temperature settings in the reflow oven chamber are vital to the quality of the PCB. Inappropriate profiles cause various defects such as cracks, bridging, delamination, etc. Solder paste manufacturers have generally provided the ideal thermal profile (i.e., target profile), and PCB manufacturers have attempted to meet the given profile by fine-tuning the oven’s recipe. The conventional method tunes the recipe to gather thermal data with a thermal measurement device. It adjusts the profile, which relies on the trial-and-error method which takes much time and effort. This paper proposes (1) a recipe initialization method for determining the initial recipe for collecting training data, (2) a stage-based (ramp, soak, and reflow) input data segmentation method for data preprocessing, (3) a backpropagation neural network, (BPNN) model for predicting the required zone temperature to reduce the gap between the actual processing profile and the target profile, (4) a mixed-integer linear programming (MILP) algorithm for generates the optimal recipe to minimize the temperature settings. This paper aims to enable non-contact prediction of required air temperature from one experiment. The MILP optimization model utilized the constraints of the upper and lower bounds obtained from the prediction result. The model has been cross-validated with different initial recipes and different target profiles. As a result, within 10 minutes of starting the experiment, the generated optimal recipe improved the fitness to the targeted profile by 4.2%, which resulted in 99% and, in the meanwhile, lowered the energy cost by 23%. Keywords: reflow thermal recipe optimization, machine learning, stage-based segmentation, backpropagation neural network (BPNN), mixed-integer linear programming (MILP). 1 1 Introduction After solder paste printing and components are picked and placed, the soldering reflow process (SRP) is the final process on the surface mount technology assembly line. The SRP is of utmost importance as part of the SMT assembly line process [1]. Meanwhile, the reflow process is also the most critical part of the green manufacturing concept because the process requirement of the reflow process has an acceptable range in the key features of the reflow profile (temperature curve). Thus, the energy consumption can be different for multiple candidate target profiles, which have different energy consumption levels. To optimize the reflow process, the fitness of the actual reflow profile and the target profile needs to be considered. Additionally, energy consumption should be optimized. The combination of the temperature settings of the heating zones in the reflow oven controls the reflow profiles. In the SRP process, several processes are involved, which include ramping, soaking, reflow, and cooling. The printed solder paste melts into a liquid to connect the copper pads and the component joints during the heating period. It becomes solid and forms solder joints during the cooling period. A target thermal profile (temperature curve) is usually recommended by the manufacturer based on the physical properties of each solder paste, which results in an ideal solder joint. Fig. 1 shows the target thermal profile for Indium 8.9HF Pb-free SAC305 (96.5% Sn, 3.0% Ag, and 0.5% Cu) solder paste, used in this research. The entire SRP includes four stages: ramping, soaking, reflow, and cooling. The target thermal profile has some key features, which include the climbing slope liquidus temperature, which is 220◦C. The target peak temperature is 240◦C, as shown in the Fig. 1, with an acceptable range of 220 − 260◦C. The target time above liquidus (TAL) is 60 seconds, with an acceptable range of 30-120. The optimized reflow recipe should satisfy the target values of the features if not within the recommended or acceptable ranges. This research aims to (1) identify the required air temperature range for the zones in the reflow oven to fit the target thermal profile using a backpropagation neural network (BPNN); (2) optimize the reflow recipe to minimize the energy consumption from the candidate recipes using MILP. The comparable study proved that the reflow profile is highly related to the long-term reliability of the solder joints [2]. The reflow profile has better fitness to the target profile and outperforms in terms of long-term reliability [3]. The solder paste manufacturer suggests the target profile, the tested outperforming reflow profile. Thus, the reflow recipe that approaches a high fitness to the target profile can optimize solder joint long-term reliability. The experimental profile is obtained from the k-type thermocouples, which are attached to the solder joints, and a non-contact prediction model proposed by the previous research [4] is used to predict the solder joint temperature to improve the testing efficiency and reduce the redundancy of experiments and by comparison of the predicted thermal 2 Fig. 1 Target profile of Indium 8.9HF SAC305 Pb-free solder paste profile and the target profile, the result can also be regarded as an evaluation method of the oven status in real-time for quality control. The main determinant of a thermal profile is the environment inside the reflow oven. Heller 1707MKEV forced convection reflow oven is used in this study, which contains seven heating zones, followed by one cooling zone. Based on the test results, one of the studies shows that heat transfer coefficients differ between periods [5]. In this study, the heat transfer coefficients are calculated separately for each zone. In this research, the required temperature-adjustable range (upper and lower bounds) of the reflow recipe for each of the 7 zones can be obtained within one iteration using ANN. The model works well with the data collected from any random initial recipe and can be applied to different target thermal profiles. To evaluate the reflow energy cost of the recipe, the reflow energy index (REI) was proposed, and an optimization model using MILP was used to obtain the optimal reflow recipe. This study is extended from tentative research we previously published [18]. The remainder of this article is organized as follows: Section 2 introduces related literature; Section 3 discusses the proposed methods in this research; Section 4 contains the experiment material, parameter settings, and results; and Section 5 considers conclusions and future work. 2 Literature Review The SRP-related publications are described in this section. The thermal profile simulation of the solder joint during SRP in the SMT assembly line has been widely studied in 2 major directions. The first one is using the physics-based model, including computational fluid 3 dynamics (CFD), finite element (FE), and finite difference (FD). The other one is the data driven approaches, especially with machine learning (ML) and artificial intelligence (AI) [6]. As far as physics-based models are concerned, FD is most commonly used to solve differential equations that govern the flow of fluid in order to simulate the thermal behavior of fluids [7, 8]. Mathematical solutions to complex equations can be obtained through the application of FE and FD techniques [6]. Several studies have demonstrated that these methods are capable of producing reliable and accurate results from simulations since the simulation results are derived from the model constructed using the physics equations [9]. Meanwhile, the disadvantage of the physics-based model is notably significant since such models require intermediate knowledge of physics equations as well as field-specific knowledge [10]. The data-driven approach, on the other hand, has the advantage of being less dependent on physics, which contributes to better generalizability, as well as improved computational efficiency [11]. In contrast to physics-based simulations, which always produce the results of a perfect environment, the data-driven model can capture the general pattern of a real-production experimental environment based on experimental data and can be compared to the data-driven model projected into the future [12]. As for the data-driven AI approaches, multiple approaches have been proposed from comparable research using numerical simulation. A simulation function was realized from a different perspective by developing equations relating to the heat transfer process. The data driven AI approach requires experimentation, and according to the experiment-based studies used in this research, the characteristics of the PCB boards and components affect heat transfer activities. The time to reach the melting point on the solder paste has a linear relationship with the thickness of the board [19, 20]. Thinner boards have larger heating factors, which can be heated up and cooled down faster, which has a higher peak temperature under the same recipe settings [21]. The thermal profile was utilized to develop a mathematical simulation model that accurately predicted solvent loss during the heating process [13] and achieved excellent simulation results. During multiple impinging jets in solder reflow, a mathematical model has been developed to predict surface temperatures, which are closely matched by the predicted surface temperatures [14]. The increasing prevalence of large data sets is giving rise to the use of Artificial Intelligence and Machine Learning (ML) to obtain classification and prediction functions in many fields. For the ML approaches in the reflow setting optimization studies, multiple approaches were used, i.e., artificial neural networks (ANN), non-linear programming (NLP), and genetic algorithm (GA) [15, 16, 22–24]. From the comparative studies, heating factor Qn is presented as a comprehensive formulation of the two parameters, the peak temperature Tp and the time above liquidus (TAL) [22, 23]. With the heating factor, the BPNN, one of the ANN approaches, was introduced to describe the non-linear relationship between the recipe settings and the reflow thermal profiles [18]. By inputting factors such as soak time, reflow time, and peak temperature in the SMT domain, ANNs were also applied to predict and optimize with high 4 accuracy obtained [22]. ANN has many advantages, including the ability to handle non-linear data with high generalization capability. The ANN models are widely used due to their capability of handling multiple-inputmultiple-output (MIMO) problems, which also offer the advantage of fitting complex non-linear relationships with low requirements of data format and knowledge of data [1, 15, 16]. Additionally, ANN was developed to predict the tolerance for shear forces in reflowed solder joints by taking into account factors such as soak time, reflow time, and peak temperature. As a result of the predictions, it was determined that the experimental shear force was highly accurate [16]. As compared to the high accuracy performance of the ANN approach, the computational cost of deep-learning approaches is significantly higher than that of data-driven machine-learning approaches [4]. It has been proposed that artificial neural networks be combined with physical equations to develop a hybrid artificial intelligence model that can accurately predict the thermal profile and temperature. Artificial intelligence-based methods have the advantage of being efficient and performing well. In addition to the drawbacks of all the physicsbased approaches, hybrid AI models require higher levels of physics knowledge, which is also inefficient from a computational perspective. Furthermore, regression-based methods of machine learning and artificial intelligence-based methods were incorporated. For example, a regression model trained using experimental data would be able to simulate the thermal profile during a solder reflow process. The optimal thermal profile was determined by utilizing a simulation model to determine a number of heat factor values based on a well-shaped thermal profile [17]. The regression-based methods have the advantage of computational efficiency but, in exchange, have a lower level of accuracy. Since this study focuses on obtaining the maximum fitness to the target reflow profile and then minimizing the energy cost, an improved version of BPNN was proposed to get the adjustable range of the recipe settings. Then, the mixed-integer linear programming (MILP) approach was used rather than the NLP approach in comparable studies. This would be beneficial in lowering computational complexity.
08-10
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值