Ridge Regression

最新推荐文章于 2024-07-16 23:47:04 发布

weixin_30673611

最新推荐文章于 2024-07-16 23:47:04 发布

阅读量87

点赞数

CC 4.0 BY-SA版权

原文链接：http://www.cnblogs.com/ysjxw/archive/2008/05/21/1204117.html

本文回顾了正则化方法的发展历程，从20世纪中叶Tikhonov解决不适定问题的工作到Hoerl和Kennard提出的岭回归方法。文中详细介绍了岭回归如何解决病态线性回归问题，并揭示了它与权重衰减之间的等价关系。此外，还探讨了岭回归在神经网络中的应用，并提出了一种局部岭回归方法。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Around the middle of the 20th century the Russian theoretician Andre Tikhonov was working on the solution of ill-posed problems. These are mathematical problems for which no unique solution exists because, in effect, there is not enough information specified in the problem. It is necessary to supply extra information (or assumptions) and the mathematical technique Tikhonov developed for this is known as regularisation.

Tikhonov's work only became widely known in the West after the publication in 1977 of his book [29]. Meanwhile, two American statisticians, Arthur Hoerl and Robert Kennard, published a paper in 1970 [11] on ridge regression, a method for solving badly conditioned linear regression problems. Bad conditioning means numerical difficulties in performing the matrix inverse necessary to obtain the variance matrix. It is also a symptom of an ill-posed regression problem in Tikhonov's sense and Hoerl & Kennard's method was in fact a crude form of regularisation, known now as zero-order regularisation [25].

In the 1980's, when neural networks became popular, weight decay was one of a number of techniques `invented' to help prune unimportant network connections. However, it was soon recognised [8] that weight decay involves adding the same penalty term to the sum-squared-error as in ridge regression. Weight-decay and ridge regression are equivalent.

While it is admittedly crude, I like ridge regression because it is mathematically and computationally convenient and consequently other forms of regularisation are rather ignored here. If the reader is interested in higher-order regularisation I suggest looking at [25] for a general overview and [16] for a specific example (second-order regularisation in RBF networks).

We next describe ridge regression from the perspective of bias and variance and how it affects the equations for the optimal weight vector, the variance matrix and the projection matrix. A method to select a good value for the regularisation parameter, based on a re-estimation formula, is then presented. Next comes a generalisation of ridge regression which, if radial basis functions are used, can be justly called local ridge regression. It involves multiple regularisation parameters and we describe a method for their optimisation. Finally, we illustrate with a simple example.