Study notes for Backpropagation

本文探讨了多层前馈网络作为线性和非线性分类器及非线性回归的应用,强调其对于噪声数据的高度容忍及对未训练模式的分类能力。文章分析了该网络的优点,包括对连续值输入输出的良好适应性以及内在并行性;同时也指出了缺点,如较长的训练时间和网络结构确定的复杂性。此外,还讨论了如何定义网络拓扑结构,并提供了关于学习率调整的建议。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

It can be used as a linear and nonlinear classifier as well as a non-linear regression (i.e. for numeric prediction). Multilayer feed-forward networks, given enough hidden units and enough training samples,can closely approximate any function.

1. Advantages

  1. High tolerance of noisy data, ability to classify patterns on which neural network has not been trained.
  2. They can be used when you have little knowledge of the relationships between attributes and classes.
  3. Well suited for continuous-valued inputs and outputs.
  4. Neural network algorithms are inherently parallel; parallelization techniques can be used to speed up the computation.

2. Disadvantages

  1. Long training time;
  2. Requires a number of parameters that are typically best determined empirically such as the network topology or "structure";
  3. Poor interpretability of the symbolic meaning behind the learned weights and of "hidden units", some solutions have been proposed for this issue, including extracting rules from networks and sensitivity analysis.

3. Defining a network topology

  1. Normalizing the input values for each attribute to [0, 1] measured in the training tuples will speed up the learning phase.
  2. One output unit may be used to represent two classes (where the value 1 represents one class, and the value 0 represents the other), then output values greater than or equal to 0.5 may be considered as belonging to the positive class, while values less than 0.5 may be considered negative. If there are more than two classes, then one output unit per class is used, then the output node with the highest value determines the predicted class label for the input values.
  3. There are no clear rules as to the "best" number of hidden layer units. Someone indicates that the proper number would be 2.5 times of input units, or squared number of input units. Andrew Ng states usually the more hidden nodes the better. In a nutshell, it is always a trial-and-error process.
  4. The learning rate helps avoid getting stuck at a local minimum in decision space (i.e. where the weights appear to converge, but are not the optimum solution) and encourages finding the global minimum. If the learning rate is too small, then learning will occur at a very slow pace. If the learning rate is too large, then oscillation between inadequate solutions may occur. A rule of thumb is to set the learning rate to 1/t, where t is the number of iterations through the training set so far.

4. Further Reading

  1. Rachel_zhang's lecture notes
  2. Backpropagation math
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值