How many kinds of Kohonen networks exist?

本文详细介绍了Kohonen网络的三种主要类型:矢量量化(VQ)、自组织映射(SOM)和学习矢量量化(LVQ),并讨论了它们的应用及训练方法。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >


How many kinds of Kohonen networks exist?

 
(And what is k-means?)
======================

Teuvo Kohonen is one of the most famous and prolific researchers in
neurocomputing, and he has invented a variety of networks. But many people
refer to "Kohonen networks" without specifying which kind of Kohonen
network, and this lack of precision can lead to confusion. The phrase
"Kohonen network" most often refers to one of the following three types of
networks: 

o VQ: Vector Quantization--competitive networks that can be viewed as
     unsupervised density estimators or autoassociators (Kohonen, 1995/1997;
     Hecht-Nielsen 1990), closely related to k-means cluster analysis
     (MacQueen, 1967; Anderberg, 1973). Each competitive unit corresponds to a
     cluster, the center of which is called a "codebook vector". Kohonen's
     learning law is an on-line algorithm that finds the codebook vector
     closest to each training case and moves the "winning" codebook vector
     closer to the training case. The codebook vector is moved a certain
     proportion of the distance between it and the training case, the
     proportion being specified by the learning rate, that is: 

           new_codebook = old_codebook * (1-learning_rate)

                                     + data * learning_rate

     Numerous similar algorithms have been developed in the neural net and
     machine learning literature; see Hecht-Nielsen (1990) for a brief
     historical overview, and Kosko (1992) for a more technical overview of
     competitive learning. 

     MacQueen's on-line k-means algorithm is essentially the same as Kohonen's
     learning law except that the learning rate is the reciprocal of the
     number of cases that have been assigned to the winnning cluster. Suppose
     that when processing a given training case, N cases have been previously
     assigned to the winning codebook vector. Then the codebook vector is
     updated as: 

           new_codebook = old_codebook * N/(N+1)

                                     + data * 1/(N+1)

     This reduction of the learning rate makes each codebook vector the mean
     of all cases assigned to its cluster and guarantees convergence of the
     algorithm to an optimum value of the error function (the sum of squared
     Euclidean distances between cases and codebook vectors) as the number of
     training cases goes to infinity. Kohonen's learning law with a fixed
     learning rate does not converge. As is well known from stochastic
     approximation theory, convergence requires the sum of the infinite
     sequence of learning rates to be infinite, while the sum of squared
     learning rates must be finite (Kohonen, 1995, p. 34). These requirements
     are satisfied by MacQueen's k-means algorithm. 

     Kohonen VQ is often used for off-line learning, in which case the
     training data are stored and Kohonen's learning law is applied to each
     case in turn, cycling over the data set many times (incremental
     training). Convergence to a local optimum can be obtained as the training
     time goes to infinity if the learning rate is reduced in a suitable
     manner as described above. However, there are off-line k-means
     algorithms, both batch and incremental, that converge in a finite number
     of iterations (Anderberg, 1973; Hartigan, 1975; Hartigan and Wong, 1979).
     The batch algorithms such as Forgy's (1965; Anderberg, 1973) have the
     advantage for large data sets, since the incremental methods require you
     either to store the cluster membership of each case or to do two
     nearest-cluster computations as each case is processed. Forgy's algorithm
     is a simple alternating least-squares algorithm consisting of the
     following steps:

     1. Initialize the codebook vectors. 
     2. Repeat the following two steps until convergence:
           A. Read the data, assigning each case to the nearest (using Euclidean
           distance) codebook vector.
           B. Replace each codebook vector with the mean of the cases that were
           assigned to it. 

     Fastest training is usually obtained if MacQueen's on-line algorithm is
     used for the first pass and off-line k-means algorithms are applied on
     subsequent passes (Bottou and Bengio, 1995). However, these training
     methods do not necessarily converge to a global optimum of the error
     function. The chance of finding a global optimum can be improved by using
     rational initialization (SAS Institute, 1989, pp. 824-825), multiple
     random initializations, or various time-consuming training methods
     intended for global optimization (Ismail and Kamel, 1989; Zeger, Vaisy,
     and Gersho, 1992). 

     VQ has been a popular topic in the signal processing literature, which
     has been largely separate from the literature on Kohonen networks and
     from the cluster analysis literature in statistics and taxonomy. In
     signal processing, on-line methods such as Kohonen's and MacQueen's are
     called "adaptive vector quantization" (AVQ), while off-line k-means
     methods go by the names of "Lloyd" or "Lloyd I" (Lloyd, 1982) and "LBG"
     (Linde, Buzo, and Gray, 1980). There is a recent textbook on VQ by Gersho
     and Gray (1992) that summarizes these algorithms as information
     compression methods. 

     Kohonen's work emphasized VQ as density estimation and hence the
     desirability of equiprobable clusters (Kohonen 1984; Hecht-Nielsen 1990).
     However, Kohonen's learning law does not produce equiprobable
     clusters--that is, the proportions of training cases assigned to each
     cluster are not usually equal. If there are I inputs and the number of
     clusters is large, the density of the codebook vectors approximates the 
     I/(I+2) power of the density of the training data (Kohonen, 1995, p.
     35; Ripley, 1996, p. 202; Zador, 1982), so the clusters are approximately
     equiprobable only if the data density is uniform or the number of inputs
     is large. The most popular method for obtaining equiprobability is
     Desieno's (1988) algorithm which adds a "conscience" value to each
     distance prior to the competition. The conscience value for each cluster
     is adjusted during training so that clusters that win more often have
     larger conscience values and are thus handicapped to even out the
     probabilities of winning in later iterations. 

     Kohonen's learning law is an approximation to the k-means model, which is
     an approximation to normal mixture estimation by maximum likelihood
     assuming that the mixture components (clusters) all have spherical
     covariance matrices and equal sampling probabilities. Hence if the
     population contains clusters that are not equiprobable, k-means will tend
     to produce sample clusters that are more nearly equiprobable than the
     population clusters. Corrections for this bias can be obtained by
     maximizing the likelihood without the assumption of equal sampling
     probabilities Symons (1981). Such corrections are similar to conscience
     but have the opposite effect. 

     In cluster analysis, the purpose is not to compress information but to
     recover the true cluster memberships. K-means differs from mixture models
     in that, for k-means, the cluster membership for each case is considered
     a separate parameter to be estimated, while mixture models estimate a
     posterior probability for each case based on the means, covariances, and
     sampling probabilities of each cluster. Balakrishnan, Cooper, Jacob, and
     Lewis (1994) found that k-means algorithms recovered cluster membership
     more accurately than Kohonen VQ. 

o SOM: Self-Organizing Map--competitive networks that provide a
     "topological" mapping from the input space to the clusters (Kohonen,
     1995). The SOM was inspired by the way in which various human sensory
     impressions are neurologically mapped into the brain such that spatial or
     other relations among stimuli correspond to spatial relations among the
     neurons. In a SOM, the neurons (clusters) are organized into a
     grid--usually two-dimensional, but sometimes one-dimensional or (rarely)
     three- or more-dimensional. The grid exists in a space that is separate
     from the input space; any number of inputs may be used as long as the
     number of inputs is greater than the dimensionality of the grid space. A
     SOM tries to find clusters such that any two clusters that are close to
     each other in the grid space have codebook vectors close to each other in
     the input space. But the converse does not hold: codebook vectors that
     are close to each other in the input space do not necessarily correspond
     to clusters that are close to each other in the grid. Another way to look
     at this is that a SOM tries to embed the grid in the input space such
     every training case is close to some codebook vector, but the grid is
     bent or stretched as little as possible. Yet another way to look at it is
     that a SOM is a (discretely) smooth mapping between regions in the input
     space and points in the grid space. The best way to undestand this is to
     look at the pictures in Kohonen (1995) or various other NN textbooks. 

     The Kohonen algorithm for SOMs is very similar to the Kohonen algorithm
     for AVQ. Let the codebook vectors be indexed by a subscript j, and let
     the index of the codebook vector nearest to the current training case be 
     n. The Kohonen SOM algorithm requires a kernel function K(j,n), where
     K(j,j)=1 and K(j,n) is usually a non-increasing function of the
     distance (in any metric) between codebook vectors j and n in the grid
     space (not the input space). Usually, K(j,n) is zero for codebook
     vectors that are far apart in the grid space. As each training case is
     processed, all the codebook vectors are updated as: 

           new_codebook   = old_codebook   * [1 - K(j,n) * learning_rate]
                                   j                             j

                                       + data * K(j,n) * learning_rate

     The kernel function does not necessarily remain constant during training.
     The neighborhood of a given codebook vector is the set of codebook
     vectors for which K(j,n)>0. To avoid poor results (akin to local
     minima), it is usually advisable to start with a large neighborhood, and
     let the neighborhood gradually shrink during training. If K(j,n)=0
     for j not equal to n, then the SOM update formula reduces to the formula
     for Kohonen vector quantization. In other words, if the neighborhood size
     (for example, the radius of the support of the kernel function K(j,n))
     is zero, the SOM algorithm degenerates into simple VQ. Hence it is
     important not to let the neighborhood size shrink all the way to zero
     during training. Indeed, the choice of the final neighborhood size is the
     most important tuning parameter for SOM training, as we will see shortly.

     A SOM works by smoothing the codebook vectors in a manner similar to
     kernel estimation methods, but the smoothing is done in neighborhoods in
     the grid space rather than in the input space (Mulier and Cherkassky
     1995). This is easier to see in a batch algorithm for SOMs, which is
     similar to Forgy's algorithm for batch k-means, but incorporates an extra
     smoothing process: 

     1. Initialize the codebook vectors. 
     2. Repeat the following two steps until convergence or boredom:
           A. Read the data, assigning each case to the nearest (using Euclidean
           distance) codebook vector. While you are doing this, keep track of the
           mean and the number of cases for each cluster. 
           B. Do a nonparametric regression using K(j,n) as a kernel function,
           with the grid points as inputs, the cluster means as target values,
           and the number of cases in each cluster as an case weight. Replace
           each codebook vector with the output of the nonparametric regression
           function evaluated at its grid point. 

     If the nonparametric regression method is Nadaraya-Watson kernel
     regression (see What is GRNN?), the batch SOM algorithm produces
     essentially the same results as Kohonen's algorithm, barring local
     minima. The main difference is that the batch algorithm often converges.
     Mulier and Cherkassky (1995) note that other nonparametric regression
     methods can be used to provide superior SOM algorithms. In particular,
     local-linear smoothing eliminates the notorious "border effect", whereby
     the codebook vectors near the border of the grid are compressed in the
     input space. The border effect is especially problematic when you try to
     use a high degree of smoothing in a Kohonen SOM, since all the codebook
     vectors will collapse into the center of the input space. The SOM border
     effect has the same mathematical cause as the "boundary effect" in kernel
     regression, which causes the estimated regression function to flatten out
     near the edges of the regression input space. There are various cures for
     the edge effect in nonparametric regression, of which local-linear
     smoothing is the simplest (Wand and Jones, 1995). Hence, local-linear
     smoothing also cures the border effect in SOMs. Furthermore, local-linear
     smoothing is much more general and reliable than the heuristic weighting
     rule proposed by Kohonen (1995, p. 129). 

     Since nonparametric regression is used in the batch SOM algorithm,
     various properties of nonparametric regression extend to SOMs. In
     particular, it is well known that the shape of the kernel function is not
     a crucial matter in nonparametric regression, hence it is not crucial in
     SOMs. On the other hand, the amount of smoothing used for nonparametric
     regression is crucial, hence the choice of the final neighborhood size in
     a SOM is crucial. Unfortunately, I am not aware of any systematic studies
     of methods for choosing the final neighborhood size. 

     The batch SOM algorithm is very similar to the principal curve and
     surface algorithm proposed by Hastie and Stuetzle (1989), as pointed out
     by Ritter, Martinetz, and Schulten (1992) and Mulier and Cherkassky
     (1995). A principal curve is a nonlinear generalization of a principal
     component. Given the probability distribution of a population, a
     principal curve is defined by the following self-consistency condition: 
     1. If you choose any point on a principal curve, 
     2. then find the set of all the points in the input space that are closer
           to the chosen point than any other point on the curve, 
     3. and compute the expected value (mean) of that set of points with
           respect to the probability distribution, then 
     4. you end up with the same point on the curve that you chose originally.
     See  http://www.iro.umontreal.ca/~kegl/research/pcurves/ for more
     information about principal curves and surfaces. 

     In a multivariate normal distribution, the principal curves are the same
     as the principal components. A principal surface is the obvious
     generalization from a curve to a surface. In a multivariate normal
     distribution, the principal surfaces are the subspaces spanned by any two
     principal components.
A one-dimensional local-linear batch SOM can be used to estimate a
     principal curve by letting the number of codebook vectors approach
     infinity while choosing a kernel function of appropriate smoothness. A
     two-dimensional local-linear batch SOM can be used to estimate a
     principal surface by letting the number of both rows and columns in the
     grid approach infinity. This connection between SOMs and principal curves
     and surfaces shows that the choice of the number of codebook vectors is
     not crucial, provided the number is fairly large. 

     If the final neighborhood size in a local-linear batch SOM is large, the
     SOM approximates a subspace spanned by principal components--usually the
     first principal component if the SOM is one-dimensional, the first two
     principal components if the SOM is two-dimensional, and so on. This
     result does not depend on the data having a multivariate normal
     distribution. 

     Principal component analysis is a method of data compression, not a
     statistical model. However, there is a related method called "common
     factor analysis" that is often confused with principal component analysis
     but is indeed a statistical model. Common factor analysis posits that the
     relations among the observed variables can be explained by a smaller
     number of unobserved, "latent" variables. Tibshirani (1992) has proposed
     a latent-variable variant of principal curves, and latent-variable
     modifications of SOMs have been introduced by Utsugi (1996, 1997) and
     Bishop, Svens閚, and Williams (1997). 

     The choice of the number of codebook vectors is usually not critical as
     long as the number is fairly large. But results can be sensitive to the
     shape of the grid, e.g., square or an elongated rectangle. And the
     dimensionality of the grid is a crucial choice. It is difficult to guess
     the appropriate shape and dimensionality before analyzing the data.
     Determining the shape and dimensionality by trial and error can be quite
     tedious. Hence, a variety of methods have been tried for growing SOMs and
     related kinds of NNs during training. For more information on growing
     SOMs, see Bernd Fritzke's home page at 
      http://pikas.inf.tu-dresden.de/~fritzke/ 

     Using a 1-by-2 SOM is pointless. There is no "topological structure" in a
     1-by-2 grid. A 1-by-2 SOM is essentially the same as VQ with two
     clusters, except that the SOM clusters will be closer together than the
     VQ clusters if the final neighborhood size for the SOM is large. 

     In a Kohonen SOM, as in VQ, it is necessary to reduce the learning rate
     during training to obtain convergence. Greg Heath has commented in this
     regard: 

     I favor separate learning rates for each winning SOM node (or k-means
     cluster) in the form 1/(N_0i + N_i + 1), where N_i is the
     count of vectors that have caused node i to be a winner and N_0i
     is an initializing count that indicates the confidence in the initial
     weight vector assignment. The winning node expression is based on
     stochastic estimation convergence constraints and pseudo-Bayesian
     estimation of mean vectors. Kohonen derived a heuristic recursion
     relation for the "optimal" rate. To my surprise, when I solved the
     recursion relation I obtained the same above expression that I've
     been using for years. 

     In addition, I have had success using the similar form 
     (1/n)/(N_0j + N_j + (1/n)) for the n nodes in the
     shrinking updating-neighborhood. Before the final "winners-only"
     stage when neighbors are no longer updated, the number of updating
     neighbors eventually shrinks to n = 6 or 8 for hexagonal or
     rectangular neighborhoods, respectively. 

     Kohonen's neighbor-update formula is more precise replacing my
     constant fraction (1/n) with a node-pair specific h_ij (h_ij
     < 1). However, as long as the initial neighborhood is sufficiently
     large, the shrinking rate is sufficiently slow, and the final
     winner-only stage is sufficiently long, the results should be
     relatively insensitive to exact form of h_ij. 

Another advantage of batch SOMs is that there is no learning rate, so these
complications evaporate. 

Kohonen (1995, p. VII) says that SOMs are not intended for pattern
recognition but for clustering, visualization, and abstraction. Kohonen has
used a "supervised SOM" (1995, pp. 160-161) that is similar to
counterpropagation (Hecht-Nielsen 1990), but he seems to prefer LVQ (see
below) for supervised classification. Many people continue to use SOMs for
classification tasks, sometimes with surprisingly (I am tempted to say
"inexplicably") good results (Cho, 1997). 

o LVQ: Learning Vector Quantization--competitive networks for supervised
classification (Kohonen, 1988, 1995; Ripley, 1996). Each codebook vector is
assigned to one of the target classes. Each class may have one or more
codebook vectors. A case is classified by finding the nearest codebook
vector and assigning the case to the class corresponding to the codebook
vector. Hence LVQ is a kind of nearest-neighbor rule. 

Ordinary VQ methods, such as Kohonen's VQ or k-means, can easily be used for
supervised classification. Simply count the number of training cases from
each class assigned to each cluster, and divide by the total number of cases
in the cluster to get the posterior probability. For a given case, output
the class with the greatest posterior probability--i.e. the class that forms
a majority in the nearest cluster. Such methods can provide universally
consistent classifiers (Devroye et al., 1996) even when the codebook vectors
are obtained by unsupervised algorithms. LVQ tries to improve on this
approach by adapting the codebook vectors in a supervised way. There are
several variants of LVQ--called LVQ1, OLVQ1, LVQ2, and LVQ3--based on
heuristics. However, a smoothed version of LVQ can be trained as a
feedforward network using a NRBFEQ architecture (see "How do MLPs compare
with RBFs?") and optimizing any of the usual error functions; as the width
of the RBFs goes to zero, the NRBFEQ network approaches an optimized LVQ
network. 

There are several other kinds of Kohonen networks described in Kohonen
(1995), including: 

o DEC--Dynamically Expanding Context 
o LSM--Learning Subspace Method 
o ASSOM--Adaptive Subspace SOM 
o FASSOM--Feedback-controlled Adaptive Subspace SOM 
o Supervised SOM 
o LVQ-SOM 

More information on the error functions (or absence thereof) used by Kohonen
VQ and SOM is provided under "What does unsupervised learning learn?" 

For more on-line information on Kohonen networks and other varieties of
SOMs, see: 

o The web page of The Neural Networks Research Centre, Helsinki University
     of Technology, at  http://www.cis.hut.fi/research/ 
o The SOM of articles from  comp.ai.neural-nets at 
      http://websom.hut.fi/websom/comp.ai.neural-nets-new/html/root.html 
o Akio Utsugi's web page on Bayesian SOMs at the National Institute of
     Bioscience and Human-Technology, Agency of Industrial Science and
     Technology, M.I.T.I., 1-1, Higashi, Tsukuba, Ibaraki, 305 Japan, at 
      http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html 
o The GTM (generative topographic mapping) home page at the Neural
     Computing Research Group, Aston University, Birmingham, UK, at 
      http://www.ncrg.aston.ac.uk/GTM/ 
o Nenet SOM software at  http://www.mbnet.fi/~phodju/nenet/nenet.html 
o Bernd Fritzke's home page at http://pikas.inf.tu-dresden.de/~fritzke/ has
     information on growing SOMs and other related types of NNs 

References: 

     Anderberg, M.R. (1973), Cluster Analysis for Applications, New York:
     Academic Press, Inc. 

     Balakrishnan, P.V., Cooper, M.C., Jacob, V.S., and Lewis, P.A. (1994) "A
     study of the classification capabilities of neural networks using
     unsupervised learning: A comparison with k-means clustering",
     Psychometrika, 59, 509-525. 

     Bishop, C.M., Svens閚, M., and Williams, C.K.I (1997), "GTM: A principled
     alternative to the self-organizing map," in Mozer, M.C., Jordan, M.I.,
     and Petsche, T., (eds.) Advances in Neural Information Processing
     Systems 9, Cambridge, MA: The MIT Press, pp. 354-360. Also see 
     http://www.ncrg.aston.ac.uk/GTM/ 

     Bottou, L., and Bengio, Y. (1995), "Convergence properties of the K-Means
     algorithms," in Tesauro, G., Touretzky, D., and Leen, T., (eds.) 
     Advances in Neural Information Processing Systems 7, Cambridge, MA: The
     MIT Press, pp. 585-592. 

     Cho, S.-B. (1997), "Self-organizing map with dynamical node-splitting:
     Application to handwritten digit recognition," Neural Computation, 9,
     1345-1355. 

     Desieno, D. (1988), "Adding a conscience to competitive learning," Proc.
     Int. Conf. on Neural Networks, I, 117-124, IEEE Press. 

     Devroye, L., Gy鰎fi, L., and Lugosi, G. (1996), A Probabilistic Theory of
     Pattern Recognition, NY: Springer, 

     Forgy, E.W. (1965), "Cluster analysis of multivariate data: Efficiency
     versus interpretability," Biometric Society Meetings, Riverside, CA.
     Abstract in Biomatrics, 21, 768. 

     Gersho, A. and Gray, R.M. (1992), Vector Quantization and Signal
     Compression, Boston: Kluwer Academic Publishers. 

     Hartigan, J.A. (1975), Clustering Algorithms, NY: Wiley. 

     Hartigan, J.A., and Wong, M.A. (1979), "Algorithm AS136: A k-means
     clustering algorithm," Applied Statistics, 28-100-108. 

     Hastie, T., and Stuetzle, W. (1989), "Principal curves," Journal of the
     American Statistical Association, 84, 502-516. 

     Hecht-Nielsen, R. (1990), Neurocomputing, Reading, MA: Addison-Wesley. 

     Ismail, M.A., and Kamel, M.S. (1989), "Multidimensional data clustering
     utilizing hybrid search strategies," Pattern Recognition, 22, 75-89. 

     Kohonen, T (1984), Self-Organization and Associative Memory, Berlin:
     Springer-Verlag. 

     Kohonen, T (1988), "Learning Vector Quantization," Neural Networks, 1
     (suppl 1), 303. 

     Kohonen, T. (1995/1997), Self-Organizing Maps, Berlin: Springer-Verlag.
     First edition was 1995, second edition 1997. See 
      http://www.cis.hut.fi/nnrc/new_book.html for information on the second
     edition. 

     Kosko, B.(1992), Neural Networks and Fuzzy Systems, Englewood Cliffs,
     N.J.: Prentice-Hall. 

     Linde, Y., Buzo, A., and Gray, R. (1980), "An algorithm for vector
     quantizer design," IEEE Transactions on Communications, 28, 84-95. 

     Lloyd, S. (1982), "Least squares quantization in PCM," IEEE Transactions
     on Information Theory, 28, 129-137. 

     MacQueen, J.B. (1967), "Some Methods for Classification and Analysis of
     Multivariate Observations,"Proceedings of the Fifth Berkeley Symposium on
     Mathematical Statistics and Probability, 1, 281-297. 

     Max, J. (1960), "Quantizing for minimum distortion," IEEE Transactions on
     Information Theory, 6, 7-12. 

     Mulier, F. and Cherkassky, V. (1995), "Self-Organization as an iterative
     kernel smoothing process," Neural Computation, 7, 1165-1177. 

     Ripley, B.D. (1996), Pattern Recognition and Neural Networks,
     Cambridge: Cambridge University Press. 

     Ritter, H., Martinetz, T., and Schulten, K. (1992), Neural Computation
     and Self-Organizing Maps: An Introduction, Reading, MA: Addison-Wesley. 

     SAS Institute (1989), SAS/STAT User's Guide, Version 6, 4th edition,
     Cary, NC: SAS Institute. 

     Symons, M.J. (1981), "Clustering Criteria and Multivariate Normal
     Mixtures," Biometrics, 37, 35-43. 

     Tibshirani, R. (1992), "Principal curves revisited," Statistics and
     Computing, 2, 183-190. 

     Utsugi, A. (1996), "Topology selection for self-organizing maps,"
     Network: Computation in Neural Systems, 7, 727-740, available on-line at 
     http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html 

     Utsugi, A. (1997), "Hyperparameter selection for self-organizing maps,"
     Neural Computation, 9, 623-635, available on-line at 
     http://www.aist.go.jp/NIBH/~b0616/Lab/index-e.html 

     Wand, M.P., and Jones, M.C. (1995), Kernel Smoothing, London: Chapman &
     Hall. 

     Zador, P.L. (1982), "Asymptotic quantization error of continuous signals
     and the quantization dimension," IEEE Transactions on Information Theory,
     28, 139-149. 

     Zeger, K., Vaisey, J., and Gersho, A. (1992), "Globally optimal vector
     quantizer design by stochastic relaxation," IEEE Transactions on Signal
     Procesing, 40, 310-322. 

文章引用自:http://www.faqs.org/faqs/ai-faq/neural-nets/part1/section-11.html



评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值