激活函数之ReLU/softplus介绍及C++实现

softplus函数(softplus function):ζ(x)=ln(1+exp(x)).

softplus函数可以用来产生正态分布的β和σ参数,因为它的范围是(0,∞)。当处理包含sigmoid函数的表达式时它也经常出现。softplus函数名字来源于它是另外一个函数的平滑(或”软化”)形式,这个函数是x+=max(0,x)。softplus 是对 ReLU 的平滑逼近的解析函数形式。

softplus函数别设计成正部函数(positive part function)的平滑版本,这个正部函数是指x+=max{0,x}。与正部函数相对的是负部函数(negative part function)x-=max{0, -x}。为了获得类似负部函数的一个平滑函数,我们可以使用ζ(-x)。就像x可以用它的正部和负部通过等式x+-x-=x恢复一样,我们也可以用同样的方式对ζ(x)和ζ(-x)进行操作,就像下式中那样:ζ(x) -ζ(-x)=x. 

Rectifier:In the context of artificial neural networks, the rectifier is an activation function defined as:

f(x)=max(0,x)

where x is the input to a neuron. This activation function was first introduced to a dynamical network by Hahnloser et al. in a 2000 paper in Nature. It has been used in convolutional networks more effectively than the widely used logistic sigmoid (which is inspired by probability theory; see logistic regression) and its more practical counterpart, the hyperbolic tangent. The rectifier is, as of 2015, the most popular activation function for deep neural networks.

A unit employing the rectifier is also called a rectified linear unit (ReLU).

A smooth approximation to the rectifier is the analytic function: f(x)=ln(1+ex), which is called the softplus function. The derivative of softplus is: f’(x)=ex/(ex+1)=1/(1+e-x), i.e. the logistic function.

Rectified linear units(ReLU) find applications in computer vision and speech recognition  using deep neural nets.

Noisy ReLUs: Rectified linear units can be extended to include Gaussian noise, making them noisy ReLUs, giving: f(x)=max(0, x+Y), with Y∽N(0, σ(x)). Noisy ReLUs have been used with some success in restricted Boltzmann machines for computer vision tasks.

Leaky ReLUs:allow a small, non-zero gradient when the unit is not active:

Parametric ReLUs take this idea further by making the coefficient of leakage into a parameter that is learned along with the other neural network parameters:

Note that for a≤1, this is equivalent to: f(x)=max(x, ax), and thus has a relation to "maxout" networks.

ELUs:Exponential linear units try to make the mean activations closer to zero which speeds up learning. It has been shown that ELUs can obtain higher classification accuracy than ReLUs:

a is a hyper-parameter to be tuned and a≥0 is a constraint.

以上内容摘自: 《深度学习中文版》和 维基百科

以下是C++测试code:

#include "funset.hpp"
#include <math.h>
#include <iostream>
#include <string>
#include <vector>
#include <opencv2/opencv.hpp>
#include "common.hpp"

// ========================= Activation Function: ELUs ========================
template<typename _Tp>
int activation_function_ELUs(const _Tp* src, _Tp* dst, int length, _Tp a = 1.)
{
	if (a < 0) {
		fprintf(stderr, "a is a hyper-parameter to be tuned and a>=0 is a constraint\n");
		return -1;
	}

	for (int i = 0; i < length; ++i) {
		dst[i] = src[i] >= (_Tp)0. ? src[i] : (a * (exp(src[i]) - (_Tp)1.));
	}

	return 0;
}

// ========================= Activation Function: Leaky_ReLUs =================
template<typename _Tp>
int activation_function_Leaky_ReLUs(const _Tp* src, _Tp* dst, int length)
{
	for (int i = 0; i < length; ++i) {
		dst[i] = src[i] > (_Tp)0. ? src[i] : (_Tp)0.01 * src[i];
	}

	return 0;
}

// ========================= Activation Function: ReLU =======================
template<typename _Tp>
int activation_function_ReLU(const _Tp* src, _Tp* dst, int length)
{
	for (int i = 0; i < length; ++i) {
		dst[i] = std::max((_Tp)0., src[i]);
	}

	return 0;
}

// ========================= Activation Function: softplus ===================
template<typename _Tp>
int activation_function_softplus(const _Tp* src, _Tp* dst, int length)
{
	for (int i = 0; i < length; ++i) {
		dst[i] = log((_Tp)1. + exp(src[i]));
	}

	return 0;
}

int test_activation_function()
{
	std::vector<double> src{ 1.23f, 4.14f, -3.23f, -1.23f, 5.21f, 0.234f, -0.78f, 6.23f };
	int length = src.size();
	std::vector<double> dst(length);

	fprintf(stderr, "source vector: \n");
	fbc::print_matrix(src);
	fprintf(stderr, "calculate activation function:\n");
	fprintf(stderr, "type: sigmoid result: \n");
	fbc::activation_function_sigmoid(src.data(), dst.data(), length);
	fbc::print_matrix(dst);
	fprintf(stderr, "type: sigmoid fast result: \n");
	fbc::activation_function_sigmoid_fast(src.data(), dst.data(), length);
	fbc::print_matrix(dst);
	fprintf(stderr, "type: softplus result: \n");
	fbc::activation_function_softplus(src.data(), dst.data(), length);
	fbc::print_matrix(dst);
	fprintf(stderr, "type: ReLU result: \n");
	fbc::activation_function_ReLU(src.data(), dst.data(), length);
	fbc::print_matrix(dst);
	fprintf(stderr, "type: Leaky ReLUs result: \n");
	fbc::activation_function_Leaky_ReLUs(src.data(), dst.data(), length);
	fbc::print_matrix(dst);
	fprintf(stderr, "type: Leaky ELUs result: \n");
	fbc::activation_function_ELUs(src.data(), dst.data(), length);
	fbc::print_matrix(dst);

	return 0;
}

GitHubhttps://github.com/fengbingchun/NN_Test

### ReLU 激活函数的数学表达式与特性 ReLU(Rectified Linear Unit)激活函数的数学表达式为: $$ f(x) = \max(0, x) $$ 该函数具有以下特性: - **非线性**:尽管形式简单,ReLU 提供了非线性变换的能力,使得神经网络可以逼近任意复杂的函数。 - **稀疏激活**:对于所有负值输入,输出为 0,这模拟了生物神经元的稀疏性[^3]。 - **梯度保持**:在 $x > 0$ 区域,导数恒为 1;在 $x < 0$ 区域,导数恒为 0。这意味着在反向传播过程中,正数区域不会出现梯度消失问题。 - **计算高效**:相比于 Sigmoid 或 Tanh 函数,ReLU 的计算成本更低。 ### Sigmoid 激活函数的数学表达式与特性 Sigmoid 函数的数学表达式为: $$ \sigma(x) = \frac{1}{1 + e^{-x}} $$ 其导数表达式为: $$ \sigma'(x) = \sigma(x)(1 - \sigma(x)) $$ Sigmoid 函数的主要特性包括: - **输出范围**:输出值在 $(0, 1)$ 区间内,适用于二分类问题中的概率解释。 - **平滑可导**:便于使用梯度下降等优化算法进行参数更新。 - **梯度消失问题**:当输入值较大或较小时,函数趋于饱和,导数接近于零,导致反向传播过程中梯度逐渐缩小,影响深层网络的训练效果[^2]。 ### 交叉熵损失函数的数学原理与推导 交叉熵损失函数常用于衡量两个概率分布之间的差异,尤其在分类任务中广泛应用。对于二分类问题,交叉熵损失函数定义如下: $$ L = -\left[y \log(p) + (1 - y)\log(1 - p)\right] $$ 其中: - $y$ 是真实标签(取值为 0 或 1) - $p$ 是模型预测的概率值(由 Sigmoid 函数输出) 在多分类任务中,若使用 Softmax 激活函数将输出转换为类别概率分布,则交叉熵损失函数表示为: $$ L = -\sum_{i=1}^{C} y_i \log(p_i) $$ 其中: - $C$ 是类别总数 - $y_i$ 是第 $i$ 类的真实标签(one-hot 编码) - $p_i$ 是模型预测的第 $i$ 类概率 #### 推导过程示例(以二分类为例): 假设一个样本的真实标签为 $y$,模型输出经过 Sigmoid 函数后得到预测概率 $p$,则交叉熵损失函数为: $$ L = -\left[y \log(p) + (1 - y)\log(1 - p)\right] $$ 对该损失函数关于权重 $w$ 求导时,结合链式法则,可以得到梯度表达式: $$ \frac{\partial L}{\partial w} = (p - y) \cdot x $$ 这一结果表明,在二分类问题中,使用 Sigmoid 激活函数配合交叉熵损失函数时,梯度表达式简洁且避免了直接对 Sigmoid 导数连乘的问题,从而缓解了梯度消失现象[^2]。 ### 示例代码:实现 ReLU、Sigmoid 和交叉熵损失函数 ```python import numpy as np # ReLU 函数及其导数 def relu(x): return np.maximum(0, x) def relu_derivative(x): return (x > 0).astype(float) # Sigmoid 函数及其导数 def sigmoid(x): return 1 / (1 + np.exp(-x)) def sigmoid_derivative(x): s = sigmoid(x) return s * (1 - s) # 二分类交叉熵损失函数 def binary_cross_entropy(y_true, y_pred): epsilon = 1e-15 # 防止 log(0) y_pred = np.clip(y_pred, epsilon, 1 - epsilon) return -(y_true * np.log(y_pred) + (1 - y_true) * np.log(1 - y_pred)).mean() # 多分类交叉熵损失函数 def categorical_cross_entropy(y_true, y_pred): epsilon = 1e-15 y_pred = np.clip(y_pred, epsilon, 1 - epsilon) return -np.sum(y_true * np.log(y_pred)) / y_true.shape[0] ``` ---
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值