将特征与标签相结合成为数据集：form_tensor_slices() /tf.data.Dataset.zip两种方法

最新推荐文章于 2025-10-03 20:07:42 发布

原创

最新推荐文章于 2025-10-03 20:07:42 发布 · 1.1k 阅读

6 ·

CC 4.0 BY-SA版权

本文介绍了两种方法将特征与标签结合成数据集。方法一是直接将两个张量合并，方法二是分别将特征和标签转化为Dataset对象，然后利用tf.data.Dataset.zip进行组合。

特征与标签相结合成为数据集

==方法一：将两个tensor结合起来==
==方法二：特征和标签各自转换为Dataset object,然后再用tf.data.Dataset.zip==

位置：F:\pyqt_practice\202005\getting_started.py\part four
tensorflow ==2.2.0

方法一：将两个tensor结合起来

import tensorflow as tf
# two tensors can be combined into one Dataset object.
# method1:
features  = tf.constant([[1,3],[2,1],[3,3]]) # 3*2  tensor
labels = tf.constant([

最低0.47元/天解锁文章

确定要放弃本次机会？

福利倒计时

: :

立减 ¥

普通VIP年卡可用

立即使用

csdnhuizhu

关注关注

0
点赞
踩
6

收藏

觉得还不错? 一键收藏
0
评论
分享

复制链接

分享到 QQ

分享到新浪微博

扫一扫
举报

举报

专栏目录

Tensorflow Dataset API详解

liushuikong的博客

01-31

1万+

Tensorflow是一个十分受欢迎的深度学习框架。为了提高框架的性能和易使用性，随着版本的迭代，tensorflow逐步添加了许多高级API。这些高级API中，有一部分是对原来API的更高级封装，还有一部分就是为了提高性能（取代旧API）而开发出来的新API。其中，Dataset API和Estimator API是TensorFlow 1.3 中引入的高级API，官方文档也推荐用户使用它们

cp16_2_建模顺序数据_RNNs_Bidirectional LSTM_gz_colab_gpu_movie_eager_prefetchDataset_text泛化_Self-Attention

Linli522362242的专栏

03-07

1717

cp16_Model Sequential_Output_Hidden_Recurrent NNs_LSTM_aclImdb_IMDb_Embed_token_py_function_GRU_Gate: Building an RNN model for the sentiment analysis task Since we have very long sequences, we are going to use an LSTM(Long short-term memory) layer ...

参与评论您还未登录，请先登录后发表或查看评论

Dataset.zip

04-02

百度“深度学习”训练营 >>> “手势识别”项目的数据集。内包含数千张0-9手势照片。百度“深度学习”训练营 >>> “手势识别”项目的数据集。内包含数千张0-9手势照片。

dataset.zip

05-16

现在将博客 https://blog.csdn.net/weixin_39249915/article/details/82686141 的数据集代码上传！！！！

03-机器学习的基础概念：解释特征、标签、训练集、测试集等基本术语

rengang66的博客

10-03

1054

机器学习的特征是指用于描述数据对象的属性或特性，是机器学习模型进行学习和预测的基础；标签是数据对象所对应的输出或结果，通常用于监督学习中的模型训练；训练集是一组用于训练机器学习模型的数据，包含了特征和相应的标签；测试集则是用于评估模型性能的数据集，它独立于训练集，确保模型在未见过的数据上也能表现出良好的泛化能力。

tensorflow将特征与标签配对

yunfeather的博客

05-10

1532

tensorflow将特征与标签配对 import tensorflow as tf #导入tensorflow features = tf.constant([1, 2, 3, 4]) #创建实例，内容为特征值 labels = tf.constant([0, 0, 1, 0]) #创建实例，内容为标签值 #使用from_tensor_slices函数将特征与标签配对 dataset = tf.data.Dataset.from_tensor_slices((features, labels))

tf.data.Dataset.from_tensor_flices()用法总结

silent1cat的博客

08-17

321

概述 tf.data.Dataset.from_tensor_slices作用于切分传入Tensor的第一个维度。生成相应的dataset。用法 1.传入的数据为矩阵，假如它的形状为(6,3) ，tf.data.Dataset.from_tensor_slices会将其切分矩阵的第一维度，最后生成的dataset含有6个元素，每个元素的形状为(3， )，即每个元素是矩阵的一行。 import tensorflow as tf import numpy as np dataset = tf.data.Da

sessionStorage 会话存储

gaoyan426926的博客

12-07

257

sessionStorage ●叫：会话存储 ●也是把数据存到浏览器 ●特点：只要一关页面，那么数据就没了 ●除了这个特点以外，其他的跟localStorage 都是一样的 ●可以把它理解为是一个短命版的 localStorage ●它如果在当前页面里刷新了或跳转了，那么数据还在 ○应用场景：可以做页面传值自定义属性 ● 标签，天生自带一些属性，这种属性称之为系统属性（自带属性、标准属性） <a href="" target=""></a> ○像a标签的href和tar.

# Copyright (c) 2018-2019, Krzysztof Rusek # All rights reserved. # Redistribution and use in source and binary forms, with or without # modification, are permitted provided that the following conditions are met: # * Redistributions of source code must retain the above copyright notice, this # list of conditions and the following disclaimer. # * Redistributions in binary form must reproduce the above copyright notice, # this list of conditions and the following disclaimer in the documentation # and/or other materials provided with the distribution. # * Neither the name of the copyright holder nor the names of its # contributors may be used to endorse or promote products derived from # this software without specific prior written permission. # THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" # AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE # IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE # DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE # FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL # DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR # SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER # CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, # OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE # OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. # author: Krzysztof Rusek, AGH import os os.environ["CUDA_VISIBLE_DEVICES"] = "-1" import tensorflow as tf from tensorflow import keras import numpy as np import argparse hparams = tf.contrib.training.HParams( node_count=14, link_state_dim=4, path_state_dim=2, T=3, readout_units=8, learning_rate=0.001, batch_size=32, dropout_rate=0.5, l2=0.1, l2_2=0.01, learn_embedding=True, # If false, only the readout is trained readout_layers=2, # number of hidden layers in readout model ) class RouteNet(tf.keras.Model): def __init__(self,hparams, output_units=1, final_activation=None): super(RouteNet, self).__init__() self.hparams = hparams self.output_units = output_units self.final_activation = final_activation def build(self, input_shape=None): del input_shape self.edge_update = tf.keras.layers.GRUCell(self.hparams.link_state_dim, name="edge_update") self.path_update = tf.keras.layers.GRUCell(self.hparams.path_state_dim, name="path_update") self.readout = tf.keras.models.Sequential(name='readout') for i in range(self.hparams.readout_layers): self.readout.add(tf.keras.layers.Dense(self.hparams.readout_units, activation=tf.nn.selu, kernel_regularizer=tf.contrib.layers.l2_regularizer(self.hparams.l2))) self.readout.add(tf.keras.layers.Dropout(rate=self.hparams.dropout_rate)) self.final = keras.layers.Dense(self.output_units, kernel_regularizer=tf.contrib.layers.l2_regularizer(self.hparams.l2_2), activation = self.final_activation ) self.edge_update.build(tf.TensorShape([None,self.hparams.path_state_dim])) self.path_update.build(tf.TensorShape([None,self.hparams.link_state_dim])) self.readout.build(input_shape = [None,self.hparams.path_state_dim]) self.final.build(input_shape = [None,self.hparams.path_state_dim + self.hparams.readout_units ]) self.built = True def call(self, inputs, training=False): ''' outputs: Natural parameter ''' f_ = inputs shape = tf.stack([f_['n_links'],self.hparams.link_state_dim-1], axis=0) #link_state = tf.zeros(shape) link_state = tf.concat([ tf.expand_dims(f_['capacities'],axis=1), tf.zeros(shape) ], axis=1) shape = tf.stack([f_['n_paths'],self.hparams.path_state_dim-1], axis=0) path_state = tf.concat([ tf.expand_dims(f_['traffic'][0:f_["n_paths"]],axis=1), tf.zeros(shape) ], axis=1) links = f_['links'] paths = f_['paths'] seqs= f_['sequences'] for _ in range(self.hparams.T): h_ = tf.gather(link_state,links) #TODO move this to feature calculation ids=tf.stack([paths, seqs], axis=1) max_len = tf.reduce_max(seqs)+1 shape = tf.stack([f_['n_paths'], max_len, self.hparams.link_state_dim]) lens = tf.segment_sum(data=tf.ones_like(paths), segment_ids=paths) link_inputs = tf.scatter_nd(ids, h_, shape) #TODO move to tf.keras.RNN outputs, path_state = tf.nn.dynamic_rnn(self.path_update, link_inputs, sequence_length=lens, initial_state = path_state, dtype=tf.float32) m = tf.gather_nd(outputs,ids) m = tf.unsorted_segment_sum(m, links ,f_['n_links']) #Keras cell expects a list link_state,_ = self.edge_update(m, [link_state]) if self.hparams.learn_embedding: r = self.readout(path_state,training=training) o = self.final(tf.concat([r,path_state], axis=1)) else: r = self.readout(tf.stop_gradient(path_state),training=training) o = self.final(tf.concat([r, tf.stop_gradient(path_state)], axis=1) ) return o def delay_model_fn( features, # This is batch_features from input_fn labels, # This is batch_labrange mode, # An instance of tf.estimator.ModeKeys params): # Additional configuration model = RouteNet(params, output_units=2) model.build() predictions = model(features, training=mode==tf.estimator.ModeKeys.TRAIN) loc = predictions[...,0] c = np.log(np.expm1( np.float32(0.098) )) scale = tf.math.softplus(c + predictions[...,1]) + np.float32(1e-9) delay_prediction = loc jitter_prediction = scale**2 if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode, predictions={'delay':delay_prediction, 'jitter':jitter_prediction} ) with tf.name_scope('heteroscedastic_loss'): x=features y=labels n=x['packets']-y['drops'] _2sigma = np.float32(2.0)*scale**2 nll = n*y['jitter']/_2sigma + n*tf.math.squared_difference(y['delay'], loc)/_2sigma + n*tf.math.log(scale) loss = tf.reduce_sum(nll)/np.float32(1e6) regularization_loss = sum(model.losses) total_loss = loss + regularization_loss tf.summary.scalar('regularization_loss', regularization_loss) if mode == tf.estimator.ModeKeys.EVAL: return tf.estimator.EstimatorSpec( mode,loss=loss, eval_metric_ops={ 'label/mean/delay':tf.metrics.mean(labels['delay']), 'label/mean/jitter':tf.metrics.mean(labels['jitter']), 'prediction/mean/delay': tf.metrics.mean(delay_prediction), 'prediction/mean/jitter': tf.metrics.mean(jitter_prediction), 'mae/delay':tf.metrics.mean_absolute_error(labels['delay'], delay_prediction), 'mae/jitter':tf.metrics.mean_absolute_error(labels['jitter'], jitter_prediction), 'rho/delay':tf.contrib.metrics.streaming_pearson_correlation(labels=labels['delay'],predictions=delay_prediction), 'rho/jitter':tf.contrib.metrics.streaming_pearson_correlation(labels=labels['jitter'],predictions=jitter_prediction) } ) assert mode == tf.estimator.ModeKeys.TRAIN trainables = model.variables grads = tf.gradients(total_loss, trainables) grad_var_pairs = zip(grads, trainables) summaries = [tf.summary.histogram(var.op.name, var) for var in trainables] summaries += [tf.summary.histogram(g.op.name, g) for g in grads if g is not None] decayed_lr = tf.train.exponential_decay(params.learning_rate, tf.train.get_global_step(), 50000, 0.9, staircase=True) optimizer=tf.train.AdamOptimizer(decayed_lr) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = optimizer.apply_gradients(grad_var_pairs, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode, loss=total_loss, train_op=train_op, ) def drop_model_fn( features, # This is batch_features from input_fn labels, # This is batch_labrange mode, # An instance of tf.estimator.ModeKeys params): # Additional configuration model = RouteNet(params, output_units=1, final_activation=None) model.build() logits = model(features, training=mode==tf.estimator.ModeKeys.TRAIN) logits = tf.squeeze(logits) predictions = tf.math.sigmoid(logits) if mode == tf.estimator.ModeKeys.PREDICT: return tf.estimator.EstimatorSpec(mode, predictions={'drops':predictions, 'logits':logits} ) with tf.name_scope('binomial_loss'): x=features y=labels loss_ratio = y['drops']/x['packets'] # Binomial negative Log-likelihood loss = tf.reduce_sum(x['packets']*tf.nn.sigmoid_cross_entropy_with_logits( labels = loss_ratio, logits = logits ))/np.float32(1e5) regularization_loss = sum(model.losses) total_loss = loss + regularization_loss tf.summary.scalar('regularization_loss', regularization_loss) if mode == tf.estimator.ModeKeys.EVAL: return tf.estimator.EstimatorSpec( mode,loss=loss, eval_metric_ops={ 'label/mean/drops':tf.metrics.mean(loss_ratio), 'prediction/mean/drops': tf.metrics.mean(predictions), 'mae/drops':tf.metrics.mean_absolute_error(loss_ratio, predictions), 'rho/drops':tf.contrib.metrics.streaming_pearson_correlation(labels=loss_ratio,predictions=predictions) } ) assert mode == tf.estimator.ModeKeys.TRAIN trainables = model.trainable_variables grads = tf.gradients(total_loss, trainables) grad_var_pairs = zip(grads, trainables) summaries = [tf.summary.histogram(var.op.name, var) for var in trainables] summaries += [tf.summary.histogram(g.op.name, g) for g in grads if g is not None] decayed_lr = tf.train.exponential_decay(params.learning_rate, tf.train.get_global_step(), 50000, 0.9, staircase=True) # TODO use decay ! optimizer=tf.train.AdamOptimizer(decayed_lr) update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS) with tf.control_dependencies(update_ops): train_op = optimizer.apply_gradients(grad_var_pairs, global_step=tf.train.get_global_step()) return tf.estimator.EstimatorSpec(mode, loss=total_loss, train_op=train_op, ) def scale_fn(k, val): '''Scales given feature Args: k: key val: tensor value ''' if k == 'traffic': return (val-0.18)/.15 if k == 'capacities': return val/10.0 return val def parse(serialized, target=None, normalize=True): ''' Target is the name of predicted variable-deprecated ''' with tf.device("/cpu:0"): with tf.name_scope('parse'): #TODO add feature spec class features = tf.io.parse_single_example( serialized, features={ 'traffic':tf.VarLenFeature(tf.float32), 'delay':tf.VarLenFeature(tf.float32), 'logdelay':tf.VarLenFeature(tf.float32), 'jitter':tf.VarLenFeature(tf.float32), 'drops':tf.VarLenFeature(tf.float32), 'packets':tf.VarLenFeature(tf.float32), 'capacities':tf.VarLenFeature(tf.float32), 'links':tf.VarLenFeature(tf.int64), 'paths':tf.VarLenFeature(tf.int64), 'sequences':tf.VarLenFeature(tf.int64), 'n_links':tf.FixedLenFeature([],tf.int64), 'n_paths':tf.FixedLenFeature([],tf.int64), 'n_total':tf.FixedLenFeature([],tf.int64) }) for k in ['traffic','delay','logdelay','jitter','drops','packets','capacities','links','paths','sequences']: features[k] = tf.sparse.to_dense( features[k] ) if normalize: features[k] = scale_fn(k, features[k]) #return {k:v for k,v in features.items() if k is not target },features[target] return features def cummax(alist, extractor): with tf.name_scope('cummax'): maxes = [tf.reduce_max( extractor(v) ) + 1 for v in alist ] cummaxes = [tf.zeros_like(maxes[0])] for i in range(len(maxes)-1): cummaxes.append( tf.math.add_n(maxes[0:i+1])) return cummaxes def transformation_func(it, batch_size=32): with tf.name_scope("transformation_func"): vs = [it.get_next() for _ in range(batch_size)] links_cummax = cummax(vs,lambda v:v['links'] ) paths_cummax = cummax(vs,lambda v:v['paths'] ) tensors = ({ 'traffic':tf.concat([v['traffic'] for v in vs], axis=0), 'capacities': tf.concat([v['capacities'] for v in vs], axis=0), 'sequences':tf.concat([v['sequences'] for v in vs], axis=0), 'packets':tf.concat([v['packets'] for v in vs], axis=0), 'links':tf.concat([v['links'] + m for v,m in zip(vs, links_cummax) ], axis=0), 'paths':tf.concat([v['paths'] + m for v,m in zip(vs, paths_cummax) ], axis=0), 'n_links':tf.math.add_n([v['n_links'] for v in vs]), 'n_paths':tf.math.add_n([v['n_paths'] for v in vs]), 'n_total':tf.math.add_n([v['n_total'] for v in vs]) }, { 'delay' : tf.concat([v['delay'] for v in vs], axis=0), 'logdelay' : tf.concat([v['logdelay'] for v in vs], axis=0), 'drops' : tf.concat([v['drops'] for v in vs], axis=0), 'jitter' : tf.concat([v['jitter'] for v in vs], axis=0), } ) return tensors def tfrecord_input_fn(filenames,hparams,shuffle_buf=1000, target='delay'): files = tf.data.Dataset.from_tensor_slices(filenames) files = files.shuffle(len(filenames)) ds = files.apply(tf.data.experimental.parallel_interleave( tf.data.TFRecordDataset, cycle_length=4)) if shuffle_buf: ds = ds.apply(tf.data.experimental.shuffle_and_repeat(shuffle_buf)) else : # sample 10 % for evaluation because it is time consuming ds = ds.filter(lambda x: tf.random_uniform(shape=())< 0.1) ds = ds.map(lambda buf:parse(buf,target), num_parallel_calls=2) ds=ds.prefetch(10) it =ds.make_one_shot_iterator() sample = transformation_func(it,hparams.batch_size) return sample def serving_input_receiver_fn(): """ This is used to define inputs to serve the model. returns: ServingInputReceiver """ receiver_tensors = { 'capacities': tf.placeholder(tf.float32, [None]), 'traffic': tf.placeholder(tf.float32, [None]), 'links': tf.placeholder(tf.int32, [None]), 'paths': tf.placeholder(tf.int32, [None]), 'sequences': tf.placeholder(tf.int32, [None]), 'n_links': tf.placeholder(tf.int32, []), 'n_paths':tf.placeholder(tf.int32, []), } # Convert give inputs to adjust to the model. features = {k: scale_fn(k,v) for k,v in receiver_tensors.items() } return tf.estimator.export.ServingInputReceiver(receiver_tensors=receiver_tensors, features=features) def train(args): print(args) tf.logging.set_verbosity('INFO') if args.hparams: hparams.parse(args.hparams) model_fn = delay_model_fn if args.target =='delay' else drop_model_fn estimator = tf.estimator.Estimator( model_fn = model_fn, model_dir=args.model_dir, params=hparams, warm_start_from=args.warm ) best_exporter = tf.estimator.BestExporter( serving_input_receiver_fn=serving_input_receiver_fn, exports_to_keep=2) latest_exporter = tf.estimator.LatestExporter( name="latests", serving_input_receiver_fn=serving_input_receiver_fn, exports_to_keep=5) train_spec = tf.estimator.TrainSpec(input_fn=lambda:tfrecord_input_fn(args.train,hparams,shuffle_buf=args.shuffle_buf,target=args.target), max_steps=args.train_steps) eval_spec = tf.estimator.EvalSpec(input_fn=lambda:tfrecord_input_fn(args.evaluation,hparams,shuffle_buf=None,target=args.target), steps=args.eval_steps, exporters=[best_exporter,latest_exporter], #throttle_secs=1800) throttle_secs=600) tf.estimator.train_and_evaluate(estimator, train_spec, eval_spec) def main(): parser = argparse.ArgumentParser(description='RouteNet script') subparsers = parser.add_subparsers(help='sub-command help') parser_train = subparsers.add_parser('train', help='Train options') parser_train.add_argument('--hparams', type=str, help='Comma separated list of "name=value" pairs.') parser_train.add_argument('--train', help='Train Tfrecords files', type=str ,nargs='+') parser_train.add_argument('--evaluation', help='Evaluation Tfrecords files', type=str ,nargs='+') parser_train.add_argument('--model_dir', help='Model directory', type=str ) parser_train.add_argument('--train_steps', help='Training steps', type=int, default=100 ) parser_train.add_argument('--eval_steps', help='Evaluation steps, defaul None= all', type=int, default=None ) parser_train.add_argument('--shuffle_buf',help = "Buffer size for samples shuffling", type=int, default=10000) parser_train.add_argument('--target',help = "Predicted variable", type=str, default='delay') parser_train.add_argument('--warm',help = "Warm start from", type=str, default=None) parser_train.set_defaults(func=train) args = parser.parse_args() return args.func(args) if __name__ == '__main__': main()

将特征与标签相结合成为数据集：form_tensor_slices() /tf.data.Dataset.zip两种方法

特征与标签相结合成为数据集

方法一 ：将两个tensor结合起来

方法一：将两个tensor结合起来