soft attention、hard attention、 local attention结构

本文深入探讨了attention机制的不同类型,包括soft、hard、local及self-attention,并解释了它们在seq2seq模型和transformer/bert中的应用。从key与query生成权重到最终形成attention value,文章详细阐述了这一过程。

1、attention 理解方式

 

                          理解 :  key 与 query  生成权重 α ,α 与value  生成 attention value

  •                    注意:在tensorflow中 seq2seq + attention 的 attention 的 key 与 value  是相同的,都是解码器的输出 ,
  •                               但是在其他框架中就不一定了,例如 transformer / bert

 

2、soft attention、global attention

  •     global attention 与 soft attention 结构完全一样

2、hard attention

1、Soft Attention中是对于每个Encoder的Hidden State会match一个概 率值,而在Hard Attention会直接找一个特定的
    单词概率为1,而 其它对应概率为0. 

 

3、local attention

4、self attention(transformer) 

            https://blog.youkuaiyun.com/qq_16555103/article/details/100920480   ------------ transformer、bert网络 

                

 

 

### Example of Attention Mechanism in Deep Learning Attention mechanisms allow models to focus on specific parts of input data when making predictions or generating outputs. For instance, during machine translation tasks, instead of encoding entire sentences into fixed-length vectors, attention allows the model to look back at different words while translating each word[^1]. This method significantly improves performance by enabling selective concentration. In a practical example within neural networks used for image captioning, hard attention selects only one region of an image per time step for observation rather than processing all regions simultaneously as soft attention would do[^3]. While this approach reduces computational load during testing phases due to fewer areas being analyzed at once, it also introduces challenges such as requiring sophisticated training techniques like variance reduction methods or reinforcement learning algorithms because of its coarse handling process. For implementation purposes, consider how global versus local attention differs: Global attention considers every position across sequences (e.g., source sentence), whereas local restricts itself around certain positions based on predefined criteria or learned parameters from previous layers' activations. Below is a simplified code snippet demonstrating part of implementing an attention layer using TensorFlow/Keras: ```python import tensorflow as tf from tensorflow.keras.layers import Layer, Dense, Softmax class SimpleAttention(Layer): def __init__(self, units): super(SimpleAttention, self).__init__() self.W1 = Dense(units) self.W2 = Dense(units) self.V = Dense(1) def call(self, query, values): # Calculate alignment scores between queries and keys(values). score = self.V(tf.nn.tanh( self.W1(query) + self.W2(values))) # Get weights through softmax normalization. attention_weights = Softmax()(score) context_vector = attention_weights * values return tf.reduce_sum(context_vector, axis=1) query_example = ... # Query tensor shape=(batch_size, hidden_size) value_examples = ... # Value tensors shape=(batch_size, sequence_length, hidden_size) attention_layer = SimpleAttention(hidden_units) output = attention_layer(query_example, value_examples) ```
评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值