TFRecord 中 FixedLenFeature、VarLenFeature、FixedLenSequenceFeature 说明

最新推荐文章于 2022-06-06 16:24:22 发布

原创最新推荐文章于 2022-06-06 16:24:22 发布 · 4.1k 阅读

2 ·

CC 4.0 BY-SA版权

文章标签：

#python #tensorflow #深度学习

tf 专栏收录该内容

40 篇文章

订阅专栏

本文介绍了TensorFlow中用于解析输入样本数据的FixedLenFeature、VarLenFeature和FixedLenSequenceFeature。FixedLenFeature用于解析定长特征，其参数包括shape（数据形状）、dtype（数据类型）和default_value（默认值）。VarLenFeature则适用于变长特征。在实际项目中，例如解析记录时，会结合这些功能来构建解析字典，确保数据正确解析。

部署运行你感兴趣的模型镜像

为了解析每个输入样本每一列数据，需要定义一个解析字典。tensorflow提供了三种方式：FixedLenFeature、VarLenFeature、FixedLenSequenceFeature，分别解析定长特征、变长特征、定长序列特征。
首先：
FixedLenFeature() 函数有三个参数：
（1）shape：输入数据的shape。
（2）dtype：输入的数据类型。
（3）default_value：如果示例缺少此功能，则使用该值。它必须与dtype和指定shape兼容。

注意点：

tf.FixedLenFeature(shape=[], dtype=tf.int64)  # 解析得到的tensor的形状形如(bs, )
tf.FixedLenFeature(shape=[k], dtype=tf.int64) # 解析得到的tensor的形状形如(bs,k )
# 所以对于定长特征要加入维度，否则就是一个值，一般对label 不传长度
features = tf.parse_single_example(serialized_example,
                                       features={
                                           'label': tf.FixedLenFeature([381], tf.int64)
                                       })
# 项目中一个解析的实例代码
def _parse_record(example_proto):
    features = {
        "label": tf.FixedLenFeature([], tf.float32),
        "feat_ids": tf.FixedLenFeature([fixed_onehot_len], tf.int64),   # 必须写长度，且是onehot固定部分的总长度
        "feat_vals": tf.FixedLenFeature([fixed_onehot_len], tf.float32),  # 必须写长度，且是onehot固定部分的总长度
        "mul_feat_ids": tf.VarLenFeature(tf.int64),
        "mul_feat_vals": tf.VarLenFeature(tf.float32)
    }
    parsed_features = tf.parse_single_example(example_proto, features=features)
    return parsed_features