激活函数hard sigmoid

本文介绍了一种用于加速神经网络计算的sigmoid函数近似方法。通过使用线性插值的分段逼近,可以显著提高计算速度,同时保持足够的精度。特别地,文章探讨了两种近似方法:超快速sigmoid和硬sigmoid,并分析了它们在不同应用场景下的表现。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

The standard sigmoid is slow to compute because it requires computing the exp() function, which is done via complex code (with some hardware assist if available in the CPU architecture). In many cases the high-precision exp() results aren’t needed, and an approximation will suffice. Such is the case in many forms of gradient-descent/optimization neural netwotks: the exact values aren’t as important as the “ballpark” values, insofar as the results are comparable with small error.

Here’s a plot of the sigmoid, “ultra-fast” sigmoid and the “hard” sigmoid:这里写图片描述
Note how the sigmoid (blue) is smooth, while the ultra-fast (green) and hard (red) sigmoids are linear piece-wise. In fact, these approximations are computed as linear interpolations between pairs of cut-points. Note the green line plot, that touches the blue one at a few points forming a set of line segments. Computing the results of this approximation is significantly faster than calling a routine implementing the sigmoid via exp() and division: all it requires is determining in which linear segment x lies and doing a simple interpolation. The approximation is just that: approximate, but the errors are low enough that many ANN algorithms run fine with the approximation.

For the hard sigmoid there are less cut points: in fact there are only 2 and therefore only two comparisons are required to ascertain in which segment the result lies and only one interpolation is required, for the central segment, as the other two segments are constant 0 and constant 1; in other words: it’s very fast. The error is larger than for the ultra-fast sigmoid, but depending on your particular case it might not change significantly the numerical results. In fact, for classification problems it rarely if ever causes errors (and when it does, some more training tends to correct it–extra training you can afford to run because your training cycles run so much faster than with the standard sigmoid).

As an added detail, you can see the effect of using piecewise interpolation when computing the sigmoid as a form of regularization, which in the right circumstances helps a lot with the creation of useful feature detectors. Just don’t use the more extreme approximations (like the hard sigmoid) when your problem is function approximation–your errors will reduce slowly, and might plateau before reaching your goal. But, again, if you’re doing classification it’s usually quite OK.

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值