TFF：Federated Reconstruction for MatrixFactorization 注释版_fedformer reconstruction matrix-优快云博客

本文链接：https://blog.youkuaiyun.com/qq_43693424/article/details/129744130

文章目录

- Federated Reconstruction for MatrixFactorization
- - Defining the Model
  - Training and Evaluation

Federated Reconstruction for MatrixFactorization

TFF 的一个例子： Federated Reconstruction for Matrix Factorization | TensorFlow Federated

都注释成这样了还是看不懂？不会吧不会吧，我这是嚼烂了给你喂到胃里了啊！！！

Defining the Model

class UserEmbedding(tf.keras.layers.Layer):
  """Keras layer representing an embedding for a single user, used below."""
# 在这个例子中，build 函数定义了一个形状为 (1, num_latent_factors) 的可训练变量 embedding，表示每个用户的潜在因素向量，其中 num_latent_factors 是模型的潜在因素数量，initializer='uniform' 表示使用均匀分布来初始化 embedding 的值。

# 需要注意的是，由于我们手动创建了 embedding 变量，而不是通过 tf.keras.layers.Embedding() 函数定义，因此需要显式调用 self.add_weight() 函数将其添加到层的参数列表中，并设置name='UserEmbeddingKernel' 来标识该变量的名称。

# 另外，super().build(input_shape) 表示调用父类 Layer 的 build 函数，以便执行一些必要的初始化和检查操作。对于大多数自定义层来说，通常都需要在 build 函数中调用 super().build()，以确保所有必要的层参数都已正确初始化。

# 总的来说，build 函数主要是用于定义和初始化层的变量，并确保所有必要的参数都已正确初始化。在实际应用中，需要根据具体情况进行适当的修改和优化，以获得更好的训练效果和推荐性能。
  def __init__(self, num_latent_factors, **kwargs):
    super().__init__(**kwargs)
    self.num_latent_factors = num_latent_factors

  def build(self, input_shape):
    self.embedding = self.add_weight(
        shape=(1, self.num_latent_factors),
        initializer='uniform',
        dtype=tf.float32,
        name='UserEmbeddingKernel')
    super().build(input_shape)

  def call(self, inputs):
    return self.embedding

  def compute_output_shape(self):
    return (1, self.num_latent_factors)

# 一个函数的输入参数部分，用于接收两个整数类型的参数 num_items 和 num_latent_factors。其中，num_items 表示要推荐的电影总数，num_latent_factors 表示矩阵分解算法中潜在因素的数量。

# 段代码中的 "-> tff.learning.reconstruction.Model" 表示这个函数返回一个 TFF 训练框架中的 reconstruction.Model 对象。
def get_matrix_factorization_model(
    num_items: int,
    num_latent_factors: int) -> tff.learning.reconstruction.Model:
  """Defines a Keras matrix factorization model."""
  # Layers with variables will be partitioned into global and local layers.
  # We'll pass this to `tff.learning.reconstruction.from_keras_model`.
  # global_layers 和 local_layers 变量则用于创建 TFF 训练框架中的 reconstruction.Model 对象时，指定哪些层需要被全局聚合和本地训练。
  global_layers = []
  local_layers = []
   
  # Extract the item embedding.
# 用于定义一个基于矩阵分解的推荐模型中的 item embedding 部分，它将每个电影 ID 映射到对应的潜在因素向量。
# 首先通过 tf.keras.layers.Input() 函数定义了一个名为 item_input 的输入层，该层接受形状为 [batch_size, 1] 的张量。这里 batch_size 表示当前样本的数量，可以是任意正整数。
  item_input = tf.keras.layers.Input(shape=[1], name='Item') # shape=[1] 表示输入层 item_input 的形状为一个一维张量，其中只包含一个元素。
# 然后，使用 tf.keras.layers.Embedding() 函数定义了一个名为 item_embedding_layer 的 Embedding 层，该层将每个电影 ID 映射为 num_latent_factors 维的潜在因素向量。
  item_embedding_layer = tf.keras.layers.Embedding(
      num_items,
      num_latent_factors,
      name='ItemEmbedding')
# 使用 global_layers.append() 将这个包含可训练变量的层添加到全局变量列表 global_layers 中，以便在分布式环境中进行全局聚合和更新。
  global_layers.append(item_embedding_layer)
# 使用 tf.keras.layers.Flatten() 将映射后的潜在因素向量展平成一维张量 flat_item_vec，并将其作为输出层，用于表示当前预测的电影所对应的潜在因素向量。
  flat_item_vec = tf.keras.layers.Flatten(name='FlattenItems')(
      item_embedding_layer(item_input))

  # Extract the user embedding.
# 定义一个基于矩阵分解的推荐模型中的 user embedding 部分，它将每个用户 ID 映射到对应的潜在因素向量。
# user_embedding_layer 只需要在本地被训练和更新，而不需要在全局聚合和更新过程中进行处理。
  user_embedding_layer = UserEmbedding(
      num_latent_factors,
      name='UserEmbedding')
  local_layers.append(user_embedding_layer)

  # The item_input never gets used by the user embedding layer,
  # but this allows the model to directly use the user embedding.
# 将计算得到的潜在因素向量 flat_user_vec 作为输出层，用于表示当前预测的用户所对应的潜在因素向量。
  flat_user_vec = user_embedding_layer(item_input)

  # Compute the dot product between the user embedding, and the item one.
# 用于计算基于矩阵分解的推荐模型中用户向量和电影向量之间的点积，得到预测评分值。
# 具体来说，代码中使用 tf.keras.layers.Dot() 函数定义了一个名为 Dot 的层，它接受两个张量作为输入 flat_user_vec 和 flat_item_vec，并返回这两个张量的点积。将计算得到的预测评分值 pred 作为输出层
  pred = tf.keras.layers.Dot(
      1, normalize=False, name='Dot')([flat_user_vec, flat_item_vec])

# 定义了一个输入规范（input_spec），其中包含两个输入项：x和y。
# x是形状为[None, 1]，数据类型为int64的张量（tensor），表示整数类型的特征值；其中，None表示该维度可以是任意大小。通常情况下，这种规范用于将输入数据传递给TensorFlow模型的接口函数中。
  input_spec = collections.OrderedDict(
      x=tf.TensorSpec(shape=[None, 1], dtype=tf.int64),
      y=tf.TensorSpec(shape=[None, 1], dtype=tf.float32))

  model = tf.keras.Model(inputs=item_input, outputs=pred)

# 使用TFF（TensorFlow Federated） API将一个Keras模型转换为TFF学习框架中的重构器（reconstructor）
# 重构器是一种特殊类型的函数，用于将客户端设备上的状态聚合成全局模型，并用于更新全局模型。
  return tff.learning.reconstruction.from_keras_model(
      keras_model=model,
      global_layers=global_layers,
      local_layers=local_layers,
      input_spec=input_spec)

# This will be used to produce our training process.
# User and item embeddings will be 50-dimensional.
# 想要训练模型时，可以直接调用model_fn函数创建模型。
model_fn = functools.partial(
    get_matrix_factorization_model,
    num_items=3706,
    num_latent_factors=50)

# 定义了一个名为RatingAccuracy的TensorFlow Keras度量（metric），它继承自tf.keras.metrics.Mean类。使用Mean作为基类来计算模型预测值与真实值之间的平均值。更具体地说，这个度量将根据每个batch的预测结果和真实结果来计算精度，并返回这些精度值的平均值。在训练过程中，我们可以通过调用RatingAccuracy.update_state()方法来更新度量的状态，最后通过调用RatingAccuracy.result()方法来获取平均精度。
class RatingAccuracy(tf.keras.metrics.Mean):
  """Keras metric computing accuracy of reconstructed ratings."""

# 来调用基类tf.keras.metrics.Mean的构造函数，并将name和kwargs作为参数传递给它。
  def __init__(self,
               name: str = 'rating_accuracy',
               **kwargs):
    super().__init__(name=name, **kwargs)

# 计算每个batch的预测结果与真实结果之间的精度。
  def update_state(self,
                   y_true: tf.Tensor,
                   y_pred: tf.Tensor,
                   sample_weight: Optional[tf.Tensor] = None): # 表示样本权重的张量
	absolute_diffs = tf.abs(y_true - y_pred)
    # A [batch_size, 1] tf.bool tensor indicating correctness within the
    # threshold for each example in a batch. A 0.5 threshold corresponds
    # to correctness when predictions are rounded to the nearest whole
    # number.
   	
    # 通过求取y_true和y_pred之间的绝对差值来计算每个样本的误差。
    # 然后，我们将这些误差与一个阈值0.5进行比较，从而得到一个[batch_size, 1]的布尔类型张量example_accuracies，其中每个元素表示相应样本是否被正确分类（误差小于等于0.5）。
    example_accuracies = tf.less_equal(absolute_diffs, 0.5)
    
    # 最后，我们调用基类tf.keras.metrics.Mean的update_state方法来更新度量的内部状态，将example_accuracies作为第一个参数传递给它，并可选地传递sample_weight作为第二个参数。
    super().update_state(example_accuracies, sample_weight=sample_weight)
	# 这样，我们就能够使用RatingAccuracy来计算整个数据集上的平均精度。

# 这段代码定义了两个Lambda函数，分别用于创建损失函数（loss function）和度量（metric）函数列表。
loss_fn = lambda: tf.keras.losses.MeanSquaredError()
metrics_fn = lambda: [RatingAccuracy()]

Training and Evaluation

# We'll use this by doing:
# state = training_process.initialize()
# state, metrics = training_process.next(state, federated_train_data)

# 构建了一个联邦学习的训练过程（training process）。这个训练过程涉及到多个客户端设备和一个中央服务器，用于联合训练一个矩阵分解模型。
# build_training_process方法接受六个参数。其中，model_fn是一个函数，用于创建要训练的模型；loss_fn是一个函数，用于计算模型预测结果与真实结果之间的损失；metrics_fn是一个函数，用于计算模型性能指标；server_optimizer_fn、client_optimizer_fn和reconstruction_optimizer_fn分别是三个函数，用于设置在不同层次上用于优化模型的优化器对象。
training_process = tff.learning.reconstruction.build_training_process(
    model_fn=model_fn,
    loss_fn=loss_fn,
    metrics_fn=metrics_fn,
    server_optimizer_fn=lambda: tf.keras.optimizers.SGD(1.0),
    client_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.5),
    reconstruction_optimizer_fn=lambda: tf.keras.optimizers.SGD(0.1))

# We'll use this by doing:
# eval_metrics = evaluation_computation(state.model, tf_val_datasets)
# where `state` is the state from the training process above.

# 构建了一个联邦学习的模型评估计算（evaluation computation），用于在训练过程中对模型进行周期性评估。
# 具体来说，build_federated_evaluation方法接受四个参数，其中model_fn、loss_fn和metrics_fn与build_training_process方法类似，用于定义要评估的模型、损失函数和性能指标。reconstruction_optimizer_fn则是用于设置重构器优化器对象，用于在评估时对全局模型进行更新。
evaluation_computation = tff.learning.reconstruction.build_federated_evaluation(
    model_fn,
    loss_fn=loss_fn,
    metrics_fn=metrics_fn,
    # functools.partial函数将tf.keras.optimizers.SGD(0.1)作为默认值，创建一个新的函数reconstruction_optimizer_fn。这个函数用于在评估过程中调整全局模型参数，并返回一个优化器对象。
    reconstruction_optimizer_fn=functools.partial(
            tf.keras.optimizers.SGD, 0.1))

# 在这个评估过程中，每个客户端会将本地数据集的特征和目标值提供给全局模型，以便对模型进行评估。然后，客户端会根据本地模型和本地数据集计算出模型的损失和性能指标，并将它们返回给服务器进行聚合。最后，全局的损失和性能指标将被计算出，并用于监控整个训练过程的进展情况。

# 初始化训练过程状态并对其进行检查
# 这段代码的输出会显示当前训练过程所使用的深度学习模型以及第一个可训练变量的形状。
state = training_process.initialize()
print(state.model)
print('Item variables shape:', state.model.trainable[0].shape)

# We shouldn't expect good evaluation results here, since we haven't trained yet!
# tf_val_datasets验证集
# 输出会显示给定深度学习模型在验证集上的初始评估指标的值。
eval_metrics = evaluation_computation(state.model, tf_val_datasets)
print('Initial Eval:', eval_metrics['eval'])

# 接下来我们可以尝试进行一轮训练。为了使事情更加真实，我们将每轮随机抽取 50 个客户端，无需替换。我们仍然应该期望训练指标很差，因为我们只进行了一轮训练。

# 使用numpy库中的random.choice函数，在tf_train_datasets列表中随机选择50个样本数据集，并将其转换为Python列表类型federated_train_data。
federated_train_data = np.random.choice(tf_train_datasets, size=50, replace=False).tolist()
# 调用了training_process.next函数，该函数接收当前模型状态state和选定的50个数据集federated_train_data作为输入，并返回更新后的模型状态和训练指标metrics。
state, metrics = training_process.next(state, federated_train_data)
# 打印出训练指标metrics中的'train'项。
print(f'Train metrics:', metrics['train'])

# 输出：Train metrics: OrderedDict([('rating_accuracy', 0.0), ('loss', 14.317455)])

NUM_ROUNDS = 20 # 训练20轮

train_losses = []
train_accs = []

state = training_process.initialize()

# This may take a couple minutes to run.
for i in range(NUM_ROUNDS):
    # 第二行代码从tf_train_datasets数据集列表中随机选择50个数据集作为本轮训练的训练数据，并将其转换为Python列表格式。
  federated_train_data = np.random.choice(tf_train_datasets, size=50, replace=False).tolist()
	# 对选定的数据集进行联邦训练，并返回更新后的模型状态和本轮训练的度量指标（metrics）。
  state, metrics = training_process.next(state, federated_train_data)
  print(f'Train round {i}:', metrics['train'])
    # 将本轮训练的损失loss保存到train_losses列表中。将本轮训练的评估指标"rating_accuracy"保存到train_accs列表中。
  train_losses.append(metrics['train']['loss'])
  train_accs.append(metrics['train']['rating_accuracy'])

# 对验证数据集（tf_val_datasets）进行模型评估，得到模型在验证数据上的度量指标。
eval_metrics = evaluation_computation(state.model, tf_val_datasets)
print('Final Eval:', eval_metrics['eval'])