Deep Learning for Chatbots, Part 1 – Introduction

深学习聊天机器人
本文探讨了使用深度学习技术构建聊天机器人的多种方法,包括检索型和生成型模型,并讨论了短对话与长对话、开放域与封闭域的不同挑战。

Chatbots, also called Conversational Agents or Dialog Systems, are a hot topic. Microsoft is making big bets on chatbots, and so are companies like Facebook (M), Apple (Siri), Google, WeChat, and Slack. There is a new wave of startups trying to change how consumers interact with services by building consumer apps like Operator or x.ai, bot platforms like Chatfuel, and bot libraries like Howdy’s Botkit. Microsoft recently released their own bot developer framework.

Many companies are hoping to develop bots to have natural conversations indistinguishable from human ones, and many are claiming to be using NLP and Deep Learning techniques to make this possible. But with all the hype around AI it’s sometimes difficult to tell fact from fiction.

In this series I want to go over some of the Deep Learning techniques that are used to build conversational agents, starting off by explaining where we are right now, what’s possible, and what will stay nearly impossible for at least a little while. This post will serve as an introduction, and we’ll get into the implementation details in upcoming posts.

A taxonomy of models

Retrieval-Based vs. Generative Models

Retrieval-based models (easier) use a repository of predefined responses and some kind of heuristic to pick an appropriate response based on the input and context. The heuristic could be as simple as a rule-based expression match, or as complex as an ensemble of Machine Learning classifiers. These systems don’t generate any new text, they just pick a response from a fixed set.

Generative models (harder) don’t rely on pre-defined responses. They generate new responses from scratch. Generative models are typically based on Machine Translation techniques, but instead of translating from one language to another, we “translate” from an input to an output (response).

Neural Conversational Model

Both approaches have some obvious pros and cons. Due to the repository of handcrafted responses, retrieval-based methods don’t make grammatical mistakes. However, they may be unable to handle unseen cases for which no appropriate predefined response exists. For the same reasons, these models can’t refer back to contextual entity information like names mentioned earlier in the conversation. Generative models are “smarter”. They can refer back to entities in the input and give the impression that you’re talking to a human. However, these models are hard to train, are quite likely to make grammatical mistakes (especially on longer sentences), and typically require huge amounts of training data.

Deep Learning techniques can be used for both retrieval-based or generative models, but research seems to be moving into the generative direction. Deep Learning architectures like Sequence to Sequence are uniquely suited for generating text and researchers are hoping to make rapid progress in this area. However, we’re still at the early stages of building generative models that work reasonably well. Production systems are more likely to be retrieval-based for now.

Long vs. Short Conversations

The longer the conversation the more difficult to automate it. On one side of the spectrum are Short-Text Conversations (easier) where the goal is to create a single response to a single input. For example, you may receive a specific question from a user and reply with an appropriate answer. Then there are long conversations (harder) where you go through multiple turns and need to keep track of what has been said. Customer support conversations are typically long conversational threads with multiple questions.

Open Domain vs. Closed Domain

In an open domain (harder) setting the user can take the conversation anywhere. There isn’t necessarily have a well-defined goal or intention. Conversations on social media sites like Twitter and Reddit are typically open domain – they can go into all kinds of directions. The infinite number of topics and the fact that a certain amount of world knowledge is required to create reasonable responses makes this a hard problem.

In a closed domain (easier) setting the space of possible inputs and outputs is somewhat limited because the system is trying to achieve a very specific goal. Technical Customer Support or Shopping Assistants are examples of closed domain problems. These systems don’t need to be able to talk about politics, they just need to fulfill their specific task as efficiently as possible. Sure, users can still take the conversation anywhere they want, but the system isn’t required to handle all these cases – and the users don’t expect it to.

Common Challenges

There are some obvious and not-so-obvious challenges when building conversational agents most of which are active research areas.

Incorporating Context

To produce sensible responses systems may need to incorporate both linguistic context and physical context. In long dialogs people keep track of what has been said and what information has been exchanged. That’s an example of linguistic context. The most common approach is to embed the conversation into a vector, but doing that with long conversations is challenging. Experiments in Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models and Attention with Intention for a Neural Network Conversation Model both go into that direction. One may also need to incorporate other kinds of contextual data such as date/time, location, or information about a user.

Coherent Personality

When generating responses the agent should ideally produce consistent answers to semantically identical inputs. For example, you want to get the same reply to “How old are you?” and “What is your age?”. This may sound simple, but incorporating such fixed knowledge or “personality” into models is very much a research problem. Many systems learn to generate linguistic plausible responses, but they are not trained to generate semantically consistent ones. Usually that’s because they are trained on a lot of data from multiple different users. Models like that in A Persona-Based Neural Conversation Model are making first steps into the direction of explicitly modeling a personality.

Example of incoherent responses of Neural Conversational Model

Evaluation of Models

The ideal way to evaluate a conversational agent is to measure whether or not it is fulfilling its task, e.g. solve a customer support problem, in a given conversation. But such labels are expensive to obtain because they require human judgment and evaluation. Sometimes there is no well-defined goal, as is the case with open-domain models. Common metrics such as BLEU that are used for Machine Translation and are based on text matching aren’t well suited because sensible responses can contain completely different words or phrases. In fact, in How NOT To Evaluate Your Dialogue System: An Empirical Study of Unsupervised Evaluation Metrics for Dialogue Response Generation researchers find that none of the commonly used metrics really correlate with human judgment.

Intention and Diversity

A common problem with generative systems is that they tend to produce generic responses like “That’s great!” or “I don’t know” that work for a lot of input cases. Early versions of Google’s Smart Reply tended to respond with “I love you” to almost anything. That’s partly a result of how these systems are trained, both in terms of data and in terms of actual training objective/algorithm. Some researchers have tried to artificially promote diversity through various objective functions. However, humans typically produce responses that are specific to the input and carry an intention. Because generative systems (and particularly open-domain systems) aren’t trained to have specific intentions they lack this kind of diversity.

How well does it actually work?

Given all the cutting edge research right now, where are we and how well do these systems actually work? Let’s consider our taxonomy again. A retrieval-based open domain system is obviously impossible because you can never handcraft enough responses to cover all cases. A generative open-domain system is almost Artificial General Intelligence (AGI) because it needs to handle all possible scenarios. We’re very far away from that as well (but a lot of research is going on in that area).

This leaves us with problems in restricted domains where both generative and retrieval based methods are appropriate. The longer the conversations and the more important the context, the more difficult the problem becomes.

In a recent interview, Andrew Ng, now chief scientist of Baidu, puts it well:

Most of the value of deep learning today is in narrow domains where you can get a lot of data. Here’s one example of something it cannot do: have a meaningful conversation. There are demos, and if you cherry-pick the conversation, it looks like it’s having a meaningful conversation, but if you actually try it yourself, it quickly goes off the rails.

Many companies start off by outsourcing their conversations to human workers and promise that they can “automate” it once they’ve collected enough data. That’s likely to happen only if they are operating in a pretty narrow domain – like a chat interface to call an Uber for example. Anything that’s a bit more open domain (like sales emails) is beyond what we can currently do. However, we can also use these systems to assist human workers by proposing and correcting responses. That’s much more feasible.

Grammatical mistakes in production systems are very costly and may drive away users. That’s why most systems are probably best off using retrieval-based methods that are free of grammatical errors and offensive responses. If companies can somehow get their hands on huge amounts of data then generative models become feasible – but they must be assisted by other techniques to prevent them from going off the rails like Microsoft’s Tay did.

Upcoming & Reading List

We’ll get into the technical details of how to implement retrieval-based and generative conversational models using Deep Learning in the next post, but if you’re interested in looking at some of the research then the following papers are a good starting point:

 Source: http://www.wildml.com/2016/04/deep-learning-for-chatbots-part-1-introduction/

Stage-based Neural Network for Reflow Profile Prediction and Reflow Recipe Optimization for Quality and Energy Saving Zhenxuan Zhang1, Yuanyuan Li1, Sang Won Yoon1, Daehan Won1* 1*System Science and Industry Engineering, State University of the New York at Binghamton, Binghamton, 13902, NY, United States. *Corresponding author(s). E-mail(s): dhwon@binghamton.edu; Contributing authors: zzhang98@binghamton.edu; yli352@binghamton.edu; yoons@binghamton.edu; Abstract During the reflow process, solder joints are formed on the boards with the placed components, so the temperature settings in the reflow oven chamber are vital to the quality of the PCB. Inappropriate profiles cause various defects such as cracks, bridging, delamination, etc. Solder paste manufacturers have generally provided the ideal thermal profile (i.e., target profile), and PCB manufacturers have attempted to meet the given profile by fine-tuning the oven’s recipe. The conventional method tunes the recipe to gather thermal data with a thermal measurement device. It adjusts the profile, which relies on the trial-and-error method which takes much time and effort. This paper proposes (1) a recipe initialization method for determining the initial recipe for collecting training data, (2) a stage-based (ramp, soak, and reflow) input data segmentation method for data preprocessing, (3) a backpropagation neural network, (BPNN) model for predicting the required zone temperature to reduce the gap between the actual processing profile and the target profile, (4) a mixed-integer linear programming (MILP) algorithm for generates the optimal recipe to minimize the temperature settings. This paper aims to enable non-contact prediction of required air temperature from one experiment. The MILP optimization model utilized the constraints of the upper and lower bounds obtained from the prediction result. The model has been cross-validated with different initial recipes and different target profiles. As a result, within 10 minutes of starting the experiment, the generated optimal recipe improved the fitness to the targeted profile by 4.2%, which resulted in 99% and, in the meanwhile, lowered the energy cost by 23%. Keywords: reflow thermal recipe optimization, machine learning, stage-based segmentation, backpropagation neural network (BPNN), mixed-integer linear programming (MILP). 1 1 Introduction After solder paste printing and components are picked and placed, the soldering reflow process (SRP) is the final process on the surface mount technology assembly line. The SRP is of utmost importance as part of the SMT assembly line process [1]. Meanwhile, the reflow process is also the most critical part of the green manufacturing concept because the process requirement of the reflow process has an acceptable range in the key features of the reflow profile (temperature curve). Thus, the energy consumption can be different for multiple candidate target profiles, which have different energy consumption levels. To optimize the reflow process, the fitness of the actual reflow profile and the target profile needs to be considered. Additionally, energy consumption should be optimized. The combination of the temperature settings of the heating zones in the reflow oven controls the reflow profiles. In the SRP process, several processes are involved, which include ramping, soaking, reflow, and cooling. The printed solder paste melts into a liquid to connect the copper pads and the component joints during the heating period. It becomes solid and forms solder joints during the cooling period. A target thermal profile (temperature curve) is usually recommended by the manufacturer based on the physical properties of each solder paste, which results in an ideal solder joint. Fig. 1 shows the target thermal profile for Indium 8.9HF Pb-free SAC305 (96.5% Sn, 3.0% Ag, and 0.5% Cu) solder paste, used in this research. The entire SRP includes four stages: ramping, soaking, reflow, and cooling. The target thermal profile has some key features, which include the climbing slope liquidus temperature, which is 220◦C. The target peak temperature is 240◦C, as shown in the Fig. 1, with an acceptable range of 220 − 260◦C. The target time above liquidus (TAL) is 60 seconds, with an acceptable range of 30-120. The optimized reflow recipe should satisfy the target values of the features if not within the recommended or acceptable ranges. This research aims to (1) identify the required air temperature range for the zones in the reflow oven to fit the target thermal profile using a backpropagation neural network (BPNN); (2) optimize the reflow recipe to minimize the energy consumption from the candidate recipes using MILP. The comparable study proved that the reflow profile is highly related to the long-term reliability of the solder joints [2]. The reflow profile has better fitness to the target profile and outperforms in terms of long-term reliability [3]. The solder paste manufacturer suggests the target profile, the tested outperforming reflow profile. Thus, the reflow recipe that approaches a high fitness to the target profile can optimize solder joint long-term reliability. The experimental profile is obtained from the k-type thermocouples, which are attached to the solder joints, and a non-contact prediction model proposed by the previous research [4] is used to predict the solder joint temperature to improve the testing efficiency and reduce the redundancy of experiments and by comparison of the predicted thermal 2 Fig. 1 Target profile of Indium 8.9HF SAC305 Pb-free solder paste profile and the target profile, the result can also be regarded as an evaluation method of the oven status in real-time for quality control. The main determinant of a thermal profile is the environment inside the reflow oven. Heller 1707MKEV forced convection reflow oven is used in this study, which contains seven heating zones, followed by one cooling zone. Based on the test results, one of the studies shows that heat transfer coefficients differ between periods [5]. In this study, the heat transfer coefficients are calculated separately for each zone. In this research, the required temperature-adjustable range (upper and lower bounds) of the reflow recipe for each of the 7 zones can be obtained within one iteration using ANN. The model works well with the data collected from any random initial recipe and can be applied to different target thermal profiles. To evaluate the reflow energy cost of the recipe, the reflow energy index (REI) was proposed, and an optimization model using MILP was used to obtain the optimal reflow recipe. This study is extended from tentative research we previously published [18]. The remainder of this article is organized as follows: Section 2 introduces related literature; Section 3 discusses the proposed methods in this research; Section 4 contains the experiment material, parameter settings, and results; and Section 5 considers conclusions and future work. 2 Literature Review The SRP-related publications are described in this section. The thermal profile simulation of the solder joint during SRP in the SMT assembly line has been widely studied in 2 major directions. The first one is using the physics-based model, including computational fluid 3 dynamics (CFD), finite element (FE), and finite difference (FD). The other one is the data driven approaches, especially with machine learning (ML) and artificial intelligence (AI) [6]. As far as physics-based models are concerned, FD is most commonly used to solve differential equations that govern the flow of fluid in order to simulate the thermal behavior of fluids [7, 8]. Mathematical solutions to complex equations can be obtained through the application of FE and FD techniques [6]. Several studies have demonstrated that these methods are capable of producing reliable and accurate results from simulations since the simulation results are derived from the model constructed using the physics equations [9]. Meanwhile, the disadvantage of the physics-based model is notably significant since such models require intermediate knowledge of physics equations as well as field-specific knowledge [10]. The data-driven approach, on the other hand, has the advantage of being less dependent on physics, which contributes to better generalizability, as well as improved computational efficiency [11]. In contrast to physics-based simulations, which always produce the results of a perfect environment, the data-driven model can capture the general pattern of a real-production experimental environment based on experimental data and can be compared to the data-driven model projected into the future [12]. As for the data-driven AI approaches, multiple approaches have been proposed from comparable research using numerical simulation. A simulation function was realized from a different perspective by developing equations relating to the heat transfer process. The data driven AI approach requires experimentation, and according to the experiment-based studies used in this research, the characteristics of the PCB boards and components affect heat transfer activities. The time to reach the melting point on the solder paste has a linear relationship with the thickness of the board [19, 20]. Thinner boards have larger heating factors, which can be heated up and cooled down faster, which has a higher peak temperature under the same recipe settings [21]. The thermal profile was utilized to develop a mathematical simulation model that accurately predicted solvent loss during the heating process [13] and achieved excellent simulation results. During multiple impinging jets in solder reflow, a mathematical model has been developed to predict surface temperatures, which are closely matched by the predicted surface temperatures [14]. The increasing prevalence of large data sets is giving rise to the use of Artificial Intelligence and Machine Learning (ML) to obtain classification and prediction functions in many fields. For the ML approaches in the reflow setting optimization studies, multiple approaches were used, i.e., artificial neural networks (ANN), non-linear programming (NLP), and genetic algorithm (GA) [15, 16, 22–24]. From the comparative studies, heating factor Qn is presented as a comprehensive formulation of the two parameters, the peak temperature Tp and the time above liquidus (TAL) [22, 23]. With the heating factor, the BPNN, one of the ANN approaches, was introduced to describe the non-linear relationship between the recipe settings and the reflow thermal profiles [18]. By inputting factors such as soak time, reflow time, and peak temperature in the SMT domain, ANNs were also applied to predict and optimize with high 4 accuracy obtained [22]. ANN has many advantages, including the ability to handle non-linear data with high generalization capability. The ANN models are widely used due to their capability of handling multiple-inputmultiple-output (MIMO) problems, which also offer the advantage of fitting complex non-linear relationships with low requirements of data format and knowledge of data [1, 15, 16]. Additionally, ANN was developed to predict the tolerance for shear forces in reflowed solder joints by taking into account factors such as soak time, reflow time, and peak temperature. As a result of the predictions, it was determined that the experimental shear force was highly accurate [16]. As compared to the high accuracy performance of the ANN approach, the computational cost of deep-learning approaches is significantly higher than that of data-driven machine-learning approaches [4]. It has been proposed that artificial neural networks be combined with physical equations to develop a hybrid artificial intelligence model that can accurately predict the thermal profile and temperature. Artificial intelligence-based methods have the advantage of being efficient and performing well. In addition to the drawbacks of all the physicsbased approaches, hybrid AI models require higher levels of physics knowledge, which is also inefficient from a computational perspective. Furthermore, regression-based methods of machine learning and artificial intelligence-based methods were incorporated. For example, a regression model trained using experimental data would be able to simulate the thermal profile during a solder reflow process. The optimal thermal profile was determined by utilizing a simulation model to determine a number of heat factor values based on a well-shaped thermal profile [17]. The regression-based methods have the advantage of computational efficiency but, in exchange, have a lower level of accuracy. Since this study focuses on obtaining the maximum fitness to the target reflow profile and then minimizing the energy cost, an improved version of BPNN was proposed to get the adjustable range of the recipe settings. Then, the mixed-integer linear programming (MILP) approach was used rather than the NLP approach in comparable studies. This would be beneficial in lowering computational complexity.
最新发布
08-10
这篇论文提出了一种基于阶段划分的神经网络和混合整数线性规划(MILP)相结合的方法,用于回流焊热曲线预测与工艺优化,旨在提高产品质量并降低能耗。下面我将从模型设计、实现方法以及关键算法三个方面进行解析。 --- ## 一、模型设计:阶段划分的BPNN神经网络 论文提出了一种**阶段划分式输入数据预处理方法**,将回流焊过程划分为三个阶段: - **Ramp(升温段)** - **Soak(保温段)** - **Reflow(回流段)** 这种划分方式有助于模型更好地捕捉不同阶段对最终热曲线的影响。接着,作者使用**反向传播神经网络(BPNN)**来预测每个加热区的温度设置,以逼近目标热曲线。 ### Python实现示例(简化版BPNN模型): ```python import numpy as np from sklearn.neural_network import MLPRegressor from sklearn.model_selection import train_test_split # 模拟数据:每个样本包含三个阶段的输入特征(例如时间、设定温度等) # X.shape = (n_samples, n_features) # y.shape = (n_samples, n_zones) # 每个样本对应7个加热区的目标温度 X = np.random.rand(1000, 9) # 3阶段 × 3个特征 y = np.random.rand(1000, 7) # 划分训练集与测试集 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # 构建BPNN模型 model = MLPRegressor(hidden_layer_sizes=(64, 64), max_iter=1000, solver='adam', activation='relu') # 训练模型 model.fit(X_train, y_train) # 预测 predicted_zones = model.predict(X_test) ``` ### 代码解释: - 使用`MLPRegressor`构建了一个具有两个隐藏层的BPNN网络,用于预测7个加热区的温度。 - 输入数据`X`是根据三个阶段划分的特征数据。 - 输出`y`为每个加热区的目标温度设置。 - 该模型可以用于预测新实验条件下所需的温度设置。 --- ## 二、优化方法:混合整数线性规划(MILP) 在预测出加热区温度范围后,论文进一步使用**混合整数线性规划(MILP)**来优化最终的回流焊配方,目标是最小化能耗(REI:Reflow Energy Index)。 ### MILP模型目标函数与约束示例(数学形式): #### 目标函数(最小化能量消耗): $$ \min \sum_{i=1}^{7} w_i \cdot T_i $$ 其中: - $ T_i $:第i个加热区的设定温度 - $ w_i $:权重系数(由BPNN预测结果得到) #### 约束条件: 1. $ T_{i}^{min} \leq T_i \leq T_{i}^{max} $ (温度上下限) 2. $ T_i \in \mathbb{Z} $ (整数温度) ### Python实现(使用PuLP库): ```python from pulp import LpMinimize, LpProblem, LpVariable, lpSum, LpStatus # 创建MILP问题 prob = LpProblem("Reflow_Recipe_Optimization", LpMinimize) # 定义变量:每个加热区的温度 T = [LpVariable(f"T{i}", lowBound=lower[i], upBound=upper[i], cat="Integer") for i in range(7)] # 设置目标函数(假设w为权重) w = [1.2, 1.1, 1.0, 0.95, 0.9, 0.85, 0.8] # 权重 prob += lpSum(w[i] * T[i] for i in range(7)) # 添加其他约束(如相邻区温度差限制等) for i in range(6): prob += T[i+1] - T[i] <= 10 # 温度变化不能太大 # 求解 prob.solve() # 输出结果 print("Status:", LpStatus[prob.status]) for i in range(7): print(f"T{i+1} = {T[i].value()}") ``` --- ## 三、关键贡献总结 1. **阶段式数据分割**:将热曲线分为ramp、soak、reflow三个阶段,提高模型预测精度。 2. **BPNN预测模型**:快速预测各加热区温度设置,为MILP提供初始上下限。 3. **MILP优化模型**:结合预测结果,最小化能耗,生成最优回流焊配方。 4. **非接触式预测**:通过热电偶数据训练模型,减少实验次数,提高效率。 --- ##
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值