激活函数hard sigmoid

最新推荐文章于 2025-07-07 22:08:05 发布

转载最新推荐文章于 2025-07-07 22:08:05 发布 · 4.2k 阅读

深度学习专栏收录该内容

10 篇文章

订阅专栏

本文介绍了一种用于加速神经网络计算的sigmoid函数近似方法。通过使用线性插值的分段逼近，可以显著提高计算速度，同时保持足够的精度。特别地，文章探讨了两种近似方法：超快速sigmoid和硬sigmoid，并分析了它们在不同应用场景下的表现。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

The standard sigmoid is slow to compute because it requires computing the exp() function, which is done via complex code (with some hardware assist if available in the CPU architecture). In many cases the high-precision exp() results aren’t needed, and an approximation will suffice. Such is the case in many forms of gradient-descent/optimization neural netwotks: the exact values aren’t as important as the “ballpark” values, insofar as the results are comparable with small error.

Here’s a plot of the sigmoid, “ultra-fast” sigmoid and the “hard” sigmoid: 这里写图片描述
Note how the sigmoid (blue) is smooth, while the ultra-fast (green) and hard (red) sigmoids are linear piece-wise. In fact, these approximations are computed as linear interpolations between pairs of cut-points. Note the green line plot, that touches the blue one at a few points forming a set of line segments. Computing the results of this approximation is significantly faster than calling a routine implementing the sigmoid via exp() and division: all it requires is determining in which linear segment x lies and doing a simple interpolation. The approximation is just that: approximate, but the errors are low enough that many ANN algorithms run fine with the approximation.

For the hard sigmoid there are less cut points: in fact there are only 2 and therefore only two comparisons are required to ascertain in which segment the result lies and only one interpolation is required, for the central segment, as the other two segments are constant 0 and constant 1; in other words: it’s very fast. The error is larger than for the ultra-fast sigmoid, but depending on your particular case it might not change significantly the numerical results. In fact, for classification problems it rarely if ever causes errors (and when it does, some more training tends to correct it–extra training you can afford to run because your training cycles run so much faster than with the standard sigmoid).

As an added detail, you can see the effect of using piecewise interpolation when computing the sigmoid as a form of regularization, which in the right circumstances helps a lot with the creation of useful feature detectors. Just don’t use the more extreme approximations (like the hard sigmoid) when your problem is function approximation–your errors will reduce slowly, and might plateau before reaching your goal. But, again, if you’re doing classification it’s usually quite OK.