The development of ANNs followed a heuristic path, with applications and extensive experimentation preceding theory. In contrast, the development of SVMs involved sound theory first, then implementation and experiments. A significant advantage of SVMs is that whilst ANNs can suffer from multiple local minima, the solution to an SVM is global and unique. Two more advantages of SVMs are that that have a simple geometric interpretation and give a sparse solution. Unlike ANNs, the computational complexity of SVMs does not depend on the dimensionality of the input space. ANNs use empirical risk minimization, whilst SVMs use structural risk minimization. The reason that SVMs often outperform ANNs in practice is that they deal with the biggest problem with ANNs, SVMs are less prone to overfitting.
"They differ radically from comparable approaches such as neural
networks: SVM training always finds a global minimum, and their simple
geometric interpretation provides fertile ground for further
investigation."
Burgess (1998)
"Most often Gaussian kernels are used, when the resulted SVM corresponds
to an RBF network with Gaussian radial basis functions. As the SVM
approach “automatically” solves the network complexity problem, the size
of the hidden layer is obtained as the result of the QP procedure.
Hidden neurons and support vectors correspond to each other, so the
center problems of the RBF network is also solved, as the support
vectors serve as the basis function centers."
Horváth (2003) in Suykens et al.
"In problems when linear decision hyperplanes are no longer feasible
(section 2.4.3), an input space is mapped into a feature space (the
hidden layer in NN models), resulting in a nonlinear classifier."
Kecman p 149
"Interestingly, by choosing the three specific functions given in table
2.1, SVMs, after the learning stage, create the same type of decision
hypersurfaces as do some well-developed and popular NN classifiers.
Note that the training of these diverse models is different. However,
after the successful learning stage, the resulting decision surfaces are
identical."
Kecman p171
"Unlike conventional statistical and neural network methods, the SVM approach does not attempt to control model complexity by keeping the number of features small.
"Classical learning systems like neural networks suffer from their theoretical weakness, e.g. back-propagation usually converges only to locally optimal solutions. Here SVMs can provide a significant improvement." Rychetsky (2001)
"In contrast to neural networks SVMs automatically select their model
size (by selecting the Support vectors)."
Rychetsky (2001)
"The absence of local minima from the above algorithms marks a major
departure from traditional systems such as neural networks,..."
Shawe-Taylor and Cristianini (2004)
"While the weight decay term is an important aspect for obtaining good
generalization in the context of neural networks for regression, the
margin plays a somewhat similar role in classification problems."
Suykens et al. (2002), page 29
"In comparison with traditional multilayer perceptron neural networks
that suffer from the existence of multiple local minima solutions,
convexity is an important and interesting property of nonlinear SVM
classifiers. [more]"
Suykens et al. (2002)
"SVMs have been developed in the reverse order to the development of
neural networks (NNs). SVMs evolved from the sound theory to the
implementation and experiments, while the NNs followed more heuristic
path, from applications and extensive experimentation to the theory."
Wang (2005)