【人工智能笔记】第三十八节：TF2实现VITGAN对抗生成网络，CoordinatesPositionalEmbedding博里叶位置编码实现

本文介绍VITGAN对抗生成网络中CoordinatesPositionalEmbedding博里叶位置编码的实现，包括代码细节及原理说明。适用于理解VITGAN中位置信息生成机制。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

网络结构图
该章节介绍VITGAN对抗生成网络中，CoordinatesPositionalEmbedding博里叶位置编码部分的代码实现。

目录（文章发布后会补上链接）：

CoordinatesPositionalEmbedding博里叶位置编码简介

Generator生成器
论文原文

CoordinatesPositionalEmbedding博里叶位置编码是用于生成二维位置信息，就是里面的 $E_{fou}$ ，对应图中 Fourier Embedding部分。

注意：该部分代码可能有误，欢迎留言指正！！！

代码实现

import tensorflow as tf

class CoordinatesPositionalEmbedding(tf.keras.layers.Layer):
    """
    博里叶位置编码
    """

    def __init__(
        self,
        patch_size,
        emb_dim,
    ):
        super().__init__()
        self.patch_size = patch_size
        self.emb_dim = emb_dim
        self.pos_emb = tf.keras.layers.Dense(emb_dim, use_bias=True)
        pos_input = tf.linspace(-1, 1, patch_size)
        pos_x, pos_y = tf.meshgrid(pos_input, pos_input)
        pos_input = tf.concat([pos_y[..., tf.newaxis],pos_x[..., tf.newaxis]], axis=-1)
        pos_input = tf.reshape(pos_input,[-1, 2])
        self.pos_input = pos_input

    def call(self, x):
        x_shape = tf.shape(x)
        batch_size = x_shape[0] * x_shape[1] # batch_size*num_patches
        x = self.pos_emb(self.pos_input) # (P*P, emb_dim)
        x = tf.math.sin(x)
        x = tf.repeat(x[tf.newaxis, ...], batch_size, axis=0) # (B*L, P*P, emb_dim)
        return x


if __name__ == "__main__":
    layer = CoordinatesPositionalEmbedding(
        patch_size=16,
        emb_dim=768 # 16*16*3
    )
    x = tf.random.uniform([2,196,768], dtype=tf.float32) # (batch_size*num_patches, patch_size*patch_size, emb_dim)
    o1 = layer(x)
    tf.print('o1:', tf.shape(o1))