tensorflow自定义网络层、激活函数(self-defined layer)

本文介绍了如何在 TensorFlow 中自定义激活函数,包括将其从 numpy 函数转换为 TensorFlow 函数,并手动定义其梯度。通过示例详细展示了如何创建一个自定义的“spiky”激活函数及其导数,最后进行了测试验证。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >




# highly based on :
# http://stackoverflow.com/questions/39921607/tensorflow-how-to-make-a-custom-activation-function-with-only-python
# https://gist.github.com/harpone/3453185b41d8d985356cbe5e57d67342


# making a numpy function to a tensorflow function:
# we will use 1) tf.py_func(func, inp, Tout, stateful=stateful, name=name), https://www.tensorflow.org/api_docs/python/tf/py_func
# which transforms any numpy function to a tensorflow function
# we will use 2) tf.RegisterGradient
# https://www.tensorflow.org/versions/r0.11/api_docs/python/framework/defining_new_operations#RegisterGradient
# https://www.tensorflow.org/versions/r0.11/api_docs/python/framework/#RegisterGradient
# we will use 3) tf.Graph.gradient_override_map
# https://www.tensorflow.org/versions/r0.11/api_docs/python/framework/
# https://www.tensorflow.org/versions/r0.11/api_docs/python/framework/core_graph_data_structures#Graph.gradient_override_map




import numpy as np
import tensorflow as tf
from tensorflow.python.framework import ops


# define common custom relu function
def my_relu_def(x, threshold=0.05):
    if x<threshold:
        return 0.0
    else:
        return x

def my_relu_grad_def(x, threshold=0.05):
    if x<threshold:
        return 0.0
    else:
        return 1.0

# making a common function into a numpy function
my_relu_np = np.vectorize(my_relu_def)
my_relu_grad_np = np.vectorize(my_relu_grad_def)
# numpy uses float64 but tensorflow uses float32
my_relu_np_32 = lambda x: my_relu_np(x).astype(np.float32)
my_relu_grad_np_32 = lambda x: my_relu_grad_np(x).astype(np.float32)



def my_relu_grad_tf(x, name=None):
    with ops.name_scope(name, "my_relu_grad_tf", [x]) as name:
        y = tf.py_func(my_relu_grad_np_32,
                       [x],
                       [tf.float32],
                       name=name,
                       stateful=False)
        return y[0]

def my_py_func(func, inp, Tout, stateful=False, name=None, my_grad_func=None):
    # Need to generate a unique name to avoid duplicates:
    random_name = 'PyFuncGrad' + str(np.random.randint(0, 1E+8))
    tf.RegisterGradient(random_name)(my_grad_func)  # see _my_relu_grad for grad example
    g = tf.get_default_graph()
    with g.gradient_override_map({"PyFunc": random_name, "PyFuncStateless": random_name}):
        return tf.py_func(func, inp, Tout, stateful=stateful, name=name)

# The grad function we need to pass to the above my_py_func function takes a special form:
# It needs to take in (an operation, the previous gradients before the operation)
# and propagate(i.e., return) the gradients backward after the operation.
def _my_relu_grad(op, pre_grad):
    x = op.inputs[0]
    cur_grad = my_relu_grad_tf(x)
    next_grad = pre_grad * cur_grad
    return next_grad

def my_relu_tf(x, name=None):
    with ops.name_scope(name, "my_relu_tf", [x]) as name:
        y = my_py_func(my_relu_np_32,
                       [x],
                       [tf.float32],
                       stateful=False,
                       name=name,
                       my_grad_func=_my_relu_grad)  # <-- here's the call to the gradient
        return y[0]

with tf.Session() as sess:
    x = tf.constant([-0.3, 0.005, 0.08, 0.12])
    y = my_relu_tf(x)
    tf.global_variables_initializer().run()
    print x.eval()
    print y.eval()
    print tf.gradients(y, [x])[0].eval()

# [-0.30000001  0.005       0.08        0.12      ]
# [ 0.    0.    0.08  0.12]
# [ 0.    0.    1.  1.]






https://stackoverflow.com/questions/39921607/how-to-make-a-custom-activation-function-with-only-python-in-tensorflow



Yes There is!

Credit: It was hard to find the information and get it working but here is an example copying from the principles and code found here and here.

Requirements: Before we start, there are two requirement for this to be able to succeed. First you need to be able to write your activation as a function on numpy arrays. Second you have to be able to write the derivative of that function either as a function in Tensorflow (easier) or in the worst case scenario as a function on numpy arrays.

Writing Activation function:

So let's take for example this function which we would want to use an activation function:

def spiky(x):
    r = x % 1
    if r <= 0.5:
        return r
    else:
        return 0

Which look as follows: Spiky Activation

The first step is making it into a numpy function, this is easy:

import numpy as np
np_spiky = np.vectorize(spiky)

Now we should write its derivative.

Gradient of Activation: In our case it is easy, it is 1 if x mod 1 < 0.5 and 0 otherwise. So:

def d_spiky(x):
    r = x % 1
    if r <= 0.5:
        return 1
    else:
        return 0
np_d_spiky = np.vectorize(
### MATLAB 2022 中实现自注意力机制 #### 实现概述 为了在MATLAB 2022中实现自注意力机制,可以利用MATLAB内置的深度学习工具箱。该工具箱支持创建复杂的神经网络架构,包括自定义的设计和集成。下面介绍一种具体的方式,在MATLAB环境中构建并训练带有自注意机制的模型。 #### 创建自定义Layer类 首先,需定义一个新的`dlnetwork`对象,并在此基础上增加一个自定义layer——即实现了自我关注功能的部分。此部分涉及到了线性变换(W_q, W_k, W_v)、缩放因子以及Softmax操作等核心要素[^2]。 ```matlab classdef SelfAttentionLayer < nnet.layer.Layer properties NumHiddenUnits % Number of hidden units (dimensionality of Q,K,V) end methods function layer = SelfAttentionLayer(numHiddenUnits,name) % Constructor for the self-attention layer. layer.NumHiddenUnits = numHiddenUnits; layer.Name = name; % Initialize learnable parameters here... layer.Description = "Self Attention Layer"; end function Z = predict(layer,X) % Forward pass through the network during prediction time % Assuming X is a matrix where each column represents one token/input vector batchSize = size(X,2); % Linear transformations to get Query(Q), Key(K) and Value(V) matrices Q = fullyconnect(X,randn([size(X,1),layer.NumHiddenUnits])); %[batchSize,numHiddenUnits] K = fullyconnect(X,randn([size(X,1),layer.NumHiddenUnits])); V = fullyconnect(X,randn([size(X,1),layer.NumHiddenUnits])); % Compute attention scores using dot product between queries & keys scores = bsxfun(@times,Q*K',sqrt(1/layer.NumHiddenUnits)); % Apply softmax over last dimension to obtain weights alpha = exp(scores-repmat(max(scores,[],2),[1,size(scores,2)])); sumAlpha = repmat(sum(alpha,2),[1,size(alpha,2)]); alpha = bsxfun(@rdivide,alpha,sumAlpha); % Weighted sum with values gives final output vectors per position Z = alpha * V; %[batchSize,batchSize]*[batchSize,numHiddenUnits]=[batchSize,numHiddenUnits] end function [Z,memory] = forward(layer,X) % Similar as 'predict' but also stores intermediate results needed for backpropagation into memory variable ... end function dLdx = backward(layer,dLdz,Z,X,memory) % Backward propagation step implementing gradient calculation w.r.t input data ... end end end ``` 这段代码展示了如何建立一个简单的自注意力。需要注意的是,实际应用时应替换随机初始化权重(`fullyconnect`)为正式的学习参数,并且完善前向传播(`forward`)与反向传播(`backward`)函数逻辑以适应特定的任务需求。 #### 构建完整的DNN结构 一旦有了上述自定义之后,则可将其与其他标准组件组合起来形成整个深神经网络体系。例如: ```matlab layers = [ featureInputLayer(inputSize,'Name','input') % Add other layers like convolutional or recurrent ones before applying self-attention... SelfAttentionLayer(hiddenDim,'selfAttn') % Optionally add more operations after self-attention such as feed-forward networks etc. regressionLayer('Name','outputRegression') ]; % Create DNN model from defined architecture net = dlnetwork(layers); ``` 这里仅给出基本框架示意;对于具体的任务场景可能还需要额外加入更多类型的预处理模块或是后续分类器设计等内容。 #### 训练过程配置 最后一步就是设置好损失函数、优化策略以及其他超参选项来进行有效的监督式学习流程。这部分可以通过调用trainNetwork()命令完成自动化迭代求解工作。 ```matlab options = trainingOptions('adam',... 'MaxEpochs',num_epochs,... 'MiniBatchSize',mini_batch_size,... 'InitialLearnRate',initial_lr,... 'GradientThreshold',grad_threshold,... 'Verbose',false,... 'Plots','training-progress'); trainedNet = trainNetwork(trainData,labels,layers,options); ``` 以上就是在MATLAB 2022环境下搭建含自注意力机制的深度学习系统的概览说明。当然这只是一个起点,随着研究深入还可以探索更加复杂高效的变体形式比如多头(self-multihead),相对位置编码(relative positional encodings)[^2]等等。
评论 11
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值