tensorflow学习之路(6):tf.strided_slice()和tf.cast()的认识

本文介绍了TensorFlow中的tf.strided_slice()和tf.cast()两个函数。tf.cast()用于将张量转换为新的类型,如在CIFAR10的代码中将uint8转换为int32。tf.strided_slice()则用于从输入张量中按指定步长和起始位置提取子序列,当步长为负数时,会反向取值。通过实例解析了这两个函数的具体用法。

最近在看CIFAR10的代码,其中,在cifar10_input.py里面,出现了

  • code1:
    # The first bytes represent the label, which we convert from uint8->int32
    result.label = tf.cast(
        tf.strided_slice(record_bytes, [0], [label_bytes]), tf.int32
    )

于是,对tf.strided_slice()和tf.cast()有了认识。于是,整理如下:

  • tf.cast()的认识
def cast(x, dtype, name=None):

官方解释:Casts a tensor to a new type.
其中,

x:是一个Tensor或者SparseTensor
dtype:目标类型
name:这个op的名字(可选)

举例说明:

 # tensor `a` is [6.3, 7.4], dtype=tf.float
 tf.cast(a, tf.int32) ==> [6, 7]  # dtype=tf.int32

因此,在code1代码中,result.label的type由tf.cast()从uint8变成了int32.

  • tf.strided_slice()的认识
def strided_slice(input_,
                  begin,
                  end,
                  strides=None,
                  begin_mask=0,
                  end_mask=0,
                  ellipsis_mask=0,
                  new_axis_mask=0,
                  shrink_axis_mask=0,
                  var=None,
                  name=None):

官方解释:

To a first order, this operation extracts a slice of size `end - begin`
from a tensor `input`
starting at the location specified by `begin`. The slice continues by adding
`stride` to the `begin` index until all dimensions are not less than `end`.
Note that components of stride can be negative, which causes a reverse
slice.

简而言之,就是:
从输入tensor ‘input’中提取一个从‘begin’位置开始,长度为’end - begin’的片段。片段增加步长为’stride’,直到所有的维度不小于‘end’.但是,如果stride的中有负数,那么,会产生一个顺序相反的slice.

举例说明:

# 'input' is [[[1, 1, 1], [2, 2, 2]],
#            [[3, 3, 3], [4, 4, 4]],
#            [[5, 5, 5], [6, 6, 6]]]
tf.strided_slice(input, [1, 0, 0], [2, 1, 3], [1, 1, 1]) ==> [[[3, 3, 3]]]
tf.strided_slice(input, [1, 0, 0], [2, 2, 3], [1, 1, 1]) ==> [[[3, 3, 3],
                                                              [4, 4, 4]]]
tf.strided_slice(input, [1, -1, 0], [2, -3, 3], [1, -1, 1]) ==>[[[4, 4, 4],
                                                                [3, 3, 3]]]

在上面例子第三个tf.strided_slice()中,

begin = [1, -1, 0]
end = [2, -3, 3]
strides = [1, -1, 1]

其中,begin中的 -1 表示要从第二维最后一个元素开始,strides中的 -1 表示第二维中每次增长步长为-1,于是,取出的元素下标是-1, -2, -3,… ,且因为 strides中的第二维步长为负数,所以,第二维元素取出后是反方向,而end中的 -3 表示截至于第二维中倒数第二个元素(包括倒数第二个元素,下标为-2),所以,最终,输出结果为[[[4, 4, 4], [3, 3, 3]]]

将其中的缓存机制去掉,代码变成了 class MLA(layers.Layer): def __init__(self, args: ModelArgs): super().__init__() self.dim = args.dim self.n_heads = args.n_heads self.q_lora_rank = args.q_lora_rank self.kv_lora_rank = args.kv_lora_rank self.qk_nope_head_dim = args.qk_nope_head_dim self.qk_rope_head_dim = args.qk_rope_head_dim self.qk_head_dim = args.qk_nope_head_dim + args.qk_rope_head_dim self.v_head_dim = args.v_head_dim # 初始化投影层 if self.q_lora_rank == 0: self.wq = layers.Dense(self.n_heads * self.qk_head_dim) else: self.wq_a = layers.Dense(self.q_lora_rank) self.q_norm = RMSNorm(self.q_lora_rank) self.wq_b = layers.Dense(self.n_heads * self.qk_head_dim) self.wkv_a = layers.Dense(self.kv_lora_rank + self.qk_rope_head_dim) self.kv_norm = RMSNorm(self.kv_lora_rank) self.wkv_b = layers.Dense(self.n_heads * (self.qk_nope_head_dim + self.v_head_dim)) self.wo = layers.Dense(self.dim) self.softmax_scale = self.qk_head_dim ** -0.5 if args.max_seq_len > args.original_seq_len: mscale = 0.1 * args.mscale * math.log(args.rope_factor) + 1.0 self.softmax_scale *= mscale * mscale def call(self, x, start_pos, freqs_cis, mask=None): bsz = tf.shape(x)[0] seqlen = tf.shape(x)[1] end_pos = start_pos + seqlen # 查询投影 if self.q_lora_rank == 0: q = self.wq(x) else: q = self.wq_b(self.q_norm(self.wq_a(x))) q = tf.reshape(q, [bsz, seqlen, self.n_heads, self.qk_head_dim]) q_nope, q_pe = tf.split(q, [self.qk_nope_head_dim, self.qk_rope_head_dim], axis=-1) q_pe = apply_rotary_emb(q_pe, freqs_cis) # 键值投影 kv = self.wkv_a(x) kv, k_pe = tf.split(kv, [self.kv_lora_rank, self.qk_rope_head_dim], axis=-1) k_pe = apply_rotary_emb(tf.expand_dims(k_pe, 2), freqs_cis) kv = self.wkv_b(self.kv_norm(kv)) kv = tf.reshape(kv, [bsz, seqlen, self.n_heads, self.qk_nope_head_dim + self.v_head_dim]) k_nope, v = tf.split(kv, [self.qk_nope_head_dim, self.v_head_dim], axis=-1) k = tf.concat([k_nope, tf.tile(k_pe, [1, 1, self.n_heads, 1])], axis=-1) # 注意力计算 q = tf.concat([q_nope, q_pe], axis=-1) scores = tf.einsum("bqhd,bkhd->bhqk", q, k) * self.softmax_scale # 维度调整为qk交互 print(scores.shape) # 此处为(2, 16, 128, 128) if mask is not None: print(mask.shape) # 此处为(128, 128) scores += mask[:, None, :, :] scores += mask[None, None, :, :] scores = tf.nn.softmax(scores, axis=-1) x = tf.einsum("bhqk,bkhd->bqhd", scores, v) # 维度调整 return self.wo(tf.reshape(x, [bsz, seqlen, -1])) 存在以下问题025-03-12 17:01:42.362209: W tensorflow/core/framework/op_kernel.cc:1830] OP_REQUIRES failed at strided_slice_op.cc:111 : INVALID_ARGUMENT: Index out of range using input dim 2; input has only 2 dims File "E:\算法模型\DeepSeek-V3-main\inference\model_tf.py", line 248, in call x = x + self.attn(self.attn_norm(x), start_pos, freqs_cis, mask) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "E:\算法模型\DeepSeek-V3-main\inference\model_tf.py", line 159, in call scores += mask[:, None, :, :] ~~~~^^^^^^^^^^^^^^^ tensorflow.python.framework.errors_impl.InvalidArgumentError: Exception encountered when calling layer 'mla' (type MLA). {{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:CPU:0}} Index out of range using input dim 2; input has only 2 dims [Op:StridedSlice] name: transformer/block/mla/strided_slice/ Call arguments received by layer 'mla' (type MLA): • x=tf.Tensor(shape=(2, 128, 10), dtype=float32) • start_pos=0 • freqs_cis=tf.Tensor(shape=(128, 32), dtype=float32) • mask=tf.Tensor(shape=(128, 128), dtype=float32)
03-13
def main(): t0 = time.time() ​ # 选择模型 model = build_lstm_model() ​ # 编译模型 model.compile(optimizer=tf.keras.optimizers.Adam(0.001), loss=tf.keras.losses.BinaryCrossentropy(), metrics=['accuracy']) ​ # 训练模型 checkpoint = ModelCheckpoint('model_checkpoint.h5', save_weights_only=True, verbose=1, save_freq='epoch') model.fit(x_train, y_train, batch_size=batch_size, epochs=epochs, callbacks=[checkpoint]) ​ # 评估模型 loss, accuracy = model.evaluate(x_test, y_test) print(f"Test Loss: {loss}, Test Accuracy: {accuracy}") ​ t1 = time.time() print(f"模型运行的时间为:{t1 - t0:.2f} 秒") ​ if __name__ == '__main__': main() 10秒 WARNING:tensorflow:From /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/initializers.py:118: calling RandomUniform.__init__ (from tensorflow.python.ops.init_ops) with dtype is deprecated and will be removed in a future version. Instructions for updating: Call initializer instance with the dtype argument instead of passing it to the constructor WARNING:tensorflow:From /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/ops/resource_variable_ops.py:1623: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version. Instructions for updating: If using Keras pass *_constraint arguments to layers. NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array. --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[21], line 24 21 print(f"模型运行的时间为:{t1 - t0:.2f} 秒") 23 if __name__ == '__main__': ---> 24 main() Cell In[21], line 5, in main() 2 t0 = time.time() 4 # 选择模型 ----> 5 model = build_lstm_model() 7 # 编译模型 8 model.compile(optimizer=tf.keras.optimizers.Adam(0.001), 9 loss=tf.keras.losses.BinaryCrossentropy(), 10 metrics=['accuracy']) Cell In[15], line 4, in build_lstm_model() 3 def build_lstm_model(): ----> 4 model = keras.Sequential([ 5 layers.Embedding(total_words, embedding_len, input_length=max_review_len), 6 layers.LSTM(64, return_sequences=False), 7 layers.Dense(1, activation='sigmoid') 8 ]) 9 return model File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/training/tracking/base.py:457, in no_automatic_dependency_tracking.<locals>._method_wrapper(self, *args, **kwargs) 455 self._self_setattr_tracking = False # pylint: disable=protected-access 456 try: --> 457 result = method(self, *args, **kwargs) 458 finally: 459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/sequential.py:113, in Sequential.__init__(self, layers, name) 111 tf_utils.assert_no_legacy_layers(layers) 112 for layer in layers: --> 113 self.add(layer) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/training/tracking/base.py:457, in no_automatic_dependency_tracking.<locals>._method_wrapper(self, *args, **kwargs) 455 self._self_setattr_tracking = False # pylint: disable=protected-access 456 try: --> 457 result = method(self, *args, **kwargs) 458 finally: 459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/sequential.py:195, in Sequential.add(self, layer) 190 self.inputs = layer_utils.get_source_inputs(self.outputs[0]) 192 elif self.outputs: 193 # If the model is being built continuously on top of an input layer: 194 # refresh its output. --> 195 output_tensor = layer(self.outputs[0]) 196 if len(nest.flatten(output_tensor)) != 1: 197 raise TypeError('All layers in a Sequential model ' 198 'should have a single output tensor. ' 199 'For multi-output layers, ' 200 'use the functional API.') File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:623, in RNN.__call__(self, inputs, initial_state, constants, **kwargs) 617 inputs, initial_state, constants = _standardize_args(inputs, 618 initial_state, 619 constants, 620 self._num_constants) 622 if initial_state is None and constants is None: --> 623 return super(RNN, self).__call__(inputs, **kwargs) 625 # If any of `initial_state` or `constants` are specified and are Keras 626 # tensors, then add them to the inputs and temporarily modify the 627 # input_spec to include them. 629 additional_inputs = [] File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/base_layer.py:854, in Layer.__call__(self, inputs, *args, **kwargs) 852 outputs = base_layer_utils.mark_as_return(outputs, acd) 853 else: --> 854 outputs = call_fn(cast_inputs, *args, **kwargs) 856 except errors.OperatorNotAllowedInGraphError as e: 857 raise TypeError('You are attempting to use Python control ' 858 'flow in a layer that was not declared to be ' 859 'dynamic. Pass `dynamic=True` to the class ' 860 'constructor.\nEncountered error:\n"""\n' + 861 str(e) + '\n"""') File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2548, in LSTM.call(self, inputs, mask, training, initial_state) 2546 self.cell.reset_dropout_mask() 2547 self.cell.reset_recurrent_dropout_mask() -> 2548 return super(LSTM, self).call( 2549 inputs, mask=mask, training=training, initial_state=initial_state) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:681, in RNN.call(self, inputs, mask, training, initial_state, constants) 675 def call(self, 676 inputs, 677 mask=None, 678 training=None, 679 initial_state=None, 680 constants=None): --> 681 inputs, initial_state, constants = self._process_inputs( 682 inputs, initial_state, constants) 684 if mask is not None: 685 # Time step masks must be the same for each input. 686 # TODO(scottzhu): Should we accept multiple different masks? 687 mask = nest.flatten(mask)[0] File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:798, in RNN._process_inputs(self, inputs, initial_state, constants) 796 initial_state = self.states 797 else: --> 798 initial_state = self.get_initial_state(inputs) 800 if len(initial_state) != len(self.states): 801 raise ValueError('Layer has ' + str(len(self.states)) + 802 ' states but was passed ' + str(len(initial_state)) + 803 ' initial states.') File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:605, in RNN.get_initial_state(self, inputs) 603 dtype = inputs.dtype 604 if get_initial_state_fn: --> 605 init_state = get_initial_state_fn( 606 inputs=None, batch_size=batch_size, dtype=dtype) 607 else: 608 init_state = _generate_zero_filled_state(batch_size, self.cell.state_size, 609 dtype) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2313, in LSTMCell.get_initial_state(self, inputs, batch_size, dtype) 2312 def get_initial_state(self, inputs=None, batch_size=None, dtype=None): -> 2313 return list(_generate_zero_filled_state_for_cell( 2314 self, inputs, batch_size, dtype)) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2752, in _generate_zero_filled_state_for_cell(cell, inputs, batch_size, dtype) 2750 batch_size = array_ops.shape(inputs)[0] 2751 dtype = inputs.dtype -> 2752 return _generate_zero_filled_state(batch_size, cell.state_size, dtype) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2768, in _generate_zero_filled_state(batch_size_tensor, state_size, dtype) 2765 return array_ops.zeros(init_state_size, dtype=dtype) 2767 if nest.is_sequence(state_size): -> 2768 return nest.map_structure(create_zeros, state_size) 2769 else: 2770 return create_zeros(state_size) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/util/nest.py:536, in map_structure(func, *structure, **kwargs) 532 flat_structure = [flatten(s, expand_composites) for s in structure] 533 entries = zip(*flat_structure) 535 return pack_sequence_as( --> 536 structure[0], [func(*x) for x in entries], 537 expand_composites=expand_composites) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/util/nest.py:536, in <listcomp>(.0) 532 flat_structure = [flatten(s, expand_composites) for s in structure] 533 entries = zip(*flat_structure) 535 return pack_sequence_as( --> 536 structure[0], [func(*x) for x in entries], 537 expand_composites=expand_composites) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2765, in _generate_zero_filled_state.<locals>.create_zeros(unnested_state_size) 2763 flat_dims = tensor_shape.as_shape(unnested_state_size).as_list() 2764 init_state_size = [batch_size_tensor] + flat_dims -> 2765 return array_ops.zeros(init_state_size, dtype=dtype) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/ops/array_ops.py:2338, in zeros(shape, dtype, name) 2334 if not isinstance(shape, ops.Tensor): 2335 try: 2336 # Create a constant if it won't be very big. Otherwise create a fill op 2337 # to prevent serialized GraphDefs from becoming too large. -> 2338 output = _constant_if_small(zero, shape, dtype, name) 2339 if output is not None: 2340 return output File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/ops/array_ops.py:2295, in _constant_if_small(value, shape, dtype, name) 2293 def _constant_if_small(value, shape, dtype, name): 2294 try: -> 2295 if np.prod(shape) < 1000: 2296 return constant(value, shape=shape, dtype=dtype, name=name) 2297 except TypeError: 2298 # Happens when shape is a Tensor, list with Tensor elements, etc. File <__array_function__ internals>:180, in prod(*args, **kwargs) File /opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3088, in prod(a, axis, dtype, out, keepdims, initial, where) 2970 @array_function_dispatch(_prod_dispatcher) 2971 def prod(a, axis=None, dtype=None, out=None, keepdims=np._NoValue, 2972 initial=np._NoValue, where=np._NoValue): 2973 """ 2974 Return the product of array elements over a given axis. 2975 (...) 3086 10 3087 """ -> 3088 return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out, 3089 keepdims=keepdims, initial=initial, where=where) File /opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs) 83 else: 84 return reduction(axis=axis, out=out, **passkwargs) ---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py:735, in Tensor.__array__(self) 734 def __array__(self): --> 735 raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy" 736 " array.".format(self.name)) NotImplementedError: Cannot convert a symbolic Tensor (lstm/strided_slice:0) to a numpy array.
06-22
NotImplementedError: Cannot convert a symbolic Tensor (lstm_1/strided_slice:0) to a numpy array. --------------------------------------------------------------------------- NotImplementedError Traceback (most recent call last) Cell In[20], line 51 48 start_time = time.time() 50 # 选择要训练的模型类型(rnn/lstm/gru) ---> 51 train_model(model_type='lstm') 53 # 计算总耗时 54 total_time = time.time() - start_time Cell In[20], line 13, in train_model(model_type) 11 model = build_gru_model() 12 else: # 默认使用LSTM ---> 13 model = build_lstm_model() 15 # 打印模型结构 16 model.summary() Cell In[16], line 2, in build_lstm_model() 1 def build_lstm_model(): ----> 2 model = keras.Sequential([ 3 layers.Embedding(total_words, embedding_len, input_length=max_review_len), 4 layers.LSTM(64, return_sequences=False), 5 layers.Dense(1, activation='sigmoid') 6 ]) 7 return model File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/training/tracking/base.py:457, in no_automatic_dependency_tracking.<locals>._method_wrapper(self, *args, **kwargs) 455 self._self_setattr_tracking = False # pylint: disable=protected-access 456 try: --> 457 result = method(self, *args, **kwargs) 458 finally: 459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/sequential.py:113, in Sequential.__init__(self, layers, name) 111 tf_utils.assert_no_legacy_layers(layers) 112 for layer in layers: --> 113 self.add(layer) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/training/tracking/base.py:457, in no_automatic_dependency_tracking.<locals>._method_wrapper(self, *args, **kwargs) 455 self._self_setattr_tracking = False # pylint: disable=protected-access 456 try: --> 457 result = method(self, *args, **kwargs) 458 finally: 459 self._self_setattr_tracking = previous_value # pylint: disable=protected-access File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/sequential.py:195, in Sequential.add(self, layer) 190 self.inputs = layer_utils.get_source_inputs(self.outputs[0]) 192 elif self.outputs: 193 # If the model is being built continuously on top of an input layer: 194 # refresh its output. --> 195 output_tensor = layer(self.outputs[0]) 196 if len(nest.flatten(output_tensor)) != 1: 197 raise TypeError('All layers in a Sequential model ' 198 'should have a single output tensor. ' 199 'For multi-output layers, ' 200 'use the functional API.') File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:623, in RNN.__call__(self, inputs, initial_state, constants, **kwargs) 617 inputs, initial_state, constants = _standardize_args(inputs, 618 initial_state, 619 constants, 620 self._num_constants) 622 if initial_state is None and constants is None: --> 623 return super(RNN, self).__call__(inputs, **kwargs) 625 # If any of `initial_state` or `constants` are specified and are Keras 626 # tensors, then add them to the inputs and temporarily modify the 627 # input_spec to include them. 629 additional_inputs = [] File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/engine/base_layer.py:854, in Layer.__call__(self, inputs, *args, **kwargs) 852 outputs = base_layer_utils.mark_as_return(outputs, acd) 853 else: --> 854 outputs = call_fn(cast_inputs, *args, **kwargs) 856 except errors.OperatorNotAllowedInGraphError as e: 857 raise TypeError('You are attempting to use Python control ' 858 'flow in a layer that was not declared to be ' 859 'dynamic. Pass `dynamic=True` to the class ' 860 'constructor.\nEncountered error:\n"""\n' + 861 str(e) + '\n"""') File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2548, in LSTM.call(self, inputs, mask, training, initial_state) 2546 self.cell.reset_dropout_mask() 2547 self.cell.reset_recurrent_dropout_mask() -> 2548 return super(LSTM, self).call( 2549 inputs, mask=mask, training=training, initial_state=initial_state) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:681, in RNN.call(self, inputs, mask, training, initial_state, constants) 675 def call(self, 676 inputs, 677 mask=None, 678 training=None, 679 initial_state=None, 680 constants=None): --> 681 inputs, initial_state, constants = self._process_inputs( 682 inputs, initial_state, constants) 684 if mask is not None: 685 # Time step masks must be the same for each input. 686 # TODO(scottzhu): Should we accept multiple different masks? 687 mask = nest.flatten(mask)[0] File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:798, in RNN._process_inputs(self, inputs, initial_state, constants) 796 initial_state = self.states 797 else: --> 798 initial_state = self.get_initial_state(inputs) 800 if len(initial_state) != len(self.states): 801 raise ValueError('Layer has ' + str(len(self.states)) + 802 ' states but was passed ' + str(len(initial_state)) + 803 ' initial states.') File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:605, in RNN.get_initial_state(self, inputs) 603 dtype = inputs.dtype 604 if get_initial_state_fn: --> 605 init_state = get_initial_state_fn( 606 inputs=None, batch_size=batch_size, dtype=dtype) 607 else: 608 init_state = _generate_zero_filled_state(batch_size, self.cell.state_size, 609 dtype) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2313, in LSTMCell.get_initial_state(self, inputs, batch_size, dtype) 2312 def get_initial_state(self, inputs=None, batch_size=None, dtype=None): -> 2313 return list(_generate_zero_filled_state_for_cell( 2314 self, inputs, batch_size, dtype)) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2752, in _generate_zero_filled_state_for_cell(cell, inputs, batch_size, dtype) 2750 batch_size = array_ops.shape(inputs)[0] 2751 dtype = inputs.dtype -> 2752 return _generate_zero_filled_state(batch_size, cell.state_size, dtype) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2768, in _generate_zero_filled_state(batch_size_tensor, state_size, dtype) 2765 return array_ops.zeros(init_state_size, dtype=dtype) 2767 if nest.is_sequence(state_size): -> 2768 return nest.map_structure(create_zeros, state_size) 2769 else: 2770 return create_zeros(state_size) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/util/nest.py:536, in map_structure(func, *structure, **kwargs) 532 flat_structure = [flatten(s, expand_composites) for s in structure] 533 entries = zip(*flat_structure) 535 return pack_sequence_as( --> 536 structure[0], [func(*x) for x in entries], 537 expand_composites=expand_composites) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/util/nest.py:536, in <listcomp>(.0) 532 flat_structure = [flatten(s, expand_composites) for s in structure] 533 entries = zip(*flat_structure) 535 return pack_sequence_as( --> 536 structure[0], [func(*x) for x in entries], 537 expand_composites=expand_composites) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/keras/layers/recurrent.py:2765, in _generate_zero_filled_state.<locals>.create_zeros(unnested_state_size) 2763 flat_dims = tensor_shape.as_shape(unnested_state_size).as_list() 2764 init_state_size = [batch_size_tensor] + flat_dims -> 2765 return array_ops.zeros(init_state_size, dtype=dtype) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/ops/array_ops.py:2338, in zeros(shape, dtype, name) 2334 if not isinstance(shape, ops.Tensor): 2335 try: 2336 # Create a constant if it won't be very big. Otherwise create a fill op 2337 # to prevent serialized GraphDefs from becoming too large. -> 2338 output = _constant_if_small(zero, shape, dtype, name) 2339 if output is not None: 2340 return output File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/ops/array_ops.py:2295, in _constant_if_small(value, shape, dtype, name) 2293 def _constant_if_small(value, shape, dtype, name): 2294 try: -> 2295 if np.prod(shape) < 1000: 2296 return constant(value, shape=shape, dtype=dtype, name=name) 2297 except TypeError: 2298 # Happens when shape is a Tensor, list with Tensor elements, etc. File <__array_function__ internals>:180, in prod(*args, **kwargs) File /opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3088, in prod(a, axis, dtype, out, keepdims, initial, where) 2970 @array_function_dispatch(_prod_dispatcher) 2971 def prod(a, axis=None, dtype=None, out=None, keepdims=np._NoValue, 2972 initial=np._NoValue, where=np._NoValue): 2973 """ 2974 Return the product of array elements over a given axis. 2975 (...) 3086 10 3087 """ -> 3088 return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out, 3089 keepdims=keepdims, initial=initial, where=where) File /opt/conda/lib/python3.8/site-packages/numpy/core/fromnumeric.py:86, in _wrapreduction(obj, ufunc, method, axis, dtype, out, **kwargs) 83 else: 84 return reduction(axis=axis, out=out, **passkwargs) ---> 86 return ufunc.reduce(obj, axis, dtype, out, **passkwargs) File /opt/conda/lib/python3.8/site-packages/tensorflow_core/python/framework/ops.py:735, in Tensor.__array__(self) 734 def __array__(self): --> 735 raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy" 736 " array.".format(self.name)) NotImplementedError: Cannot convert a symbolic Tensor (lstm_1/strided_slice:0) to a numpy array. + Code + Markdown ​ + Code + Markdown 4.使用训练好的模型预测文本类型 + Code + Markdown #选做 ​ + Code + Markdown keras关于fit方法中的参数定义如下 def fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=‘auto’, callbacks=None, validation_split=0., validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None, validation_batch_size=None, validation_freq=1, max_queue_size=10, workers=1, use_multiprocessing=False): + Code + Markdown
06-22
在使用 TensorFlow 进行张量操作时,出现 `slice index -1 of dimension 0 out of bounds` 错误通常表示尝试对一个维度为空或不包含足够元素的张量进行负索引访问。例如,在一个形状为 `(0,)` 的张量中尝试使用 `-1` 索引访问最后一个元素,会导致越界错误,因为该张量实际上没有元素可供访问[^1]。 ### 原因分析 1. **空张量切片** 如果某个张量的维度大小为 0,例如通过 `tf.zeros((0, 10))` 创建的张量,在尝试使用 `[-1]` 访问最后一个元素时会失败。因为 `-1` 表示最后一个元素的索引,但张量中没有元素存在。 2. **动态形状问题** 在使用 `tf.function` 或 `tf.data.Dataset` 构建计算图时,张量的形状可能是动态的。如果在运行时某个维度的大小为 0,而代码中又使用了负索引进行切片操作,就会触发越界错误。 3. **数据预处理逻辑错误** 在构建数据流水线时,可能由于数据过滤、采样或批处理逻辑错误,导致某些批次为空。例如,使用 `tf.data.Dataset.filter` 或 `tf.data.Dataset.padded_batch` 时,若数据过滤过于严格或样本长度不一致,可能会生成空批次。 ### 解决方案 1. **检查张量形状** 在进行切片操作前,可以使用 `tf.shape` 获取张量的实际形状,并进行条件判断以避免越界访问。例如: ```python import tensorflow as tf tensor = tf.constant([[1, 2], [3, 4]]) shape = tf.shape(tensor) if shape[0] > 0: last_element = tensor[-1] else: last_element = tf.constant([]) ``` 2. **使用 `tf.cond` 实现安全切片** 在图模式下(如 `tf.function` 中),可以使用 `tf.cond` 来实现条件判断,从而避免在空张量上执行切片操作: ```python @tf.function def safe_slice(tensor): shape = tf.shape(tensor) return tf.cond(shape[0] > 0, lambda: tensor[-1], lambda: tf.constant([])) ``` 3. **数据预处理阶段增加验证逻辑** 在构建 `tf.data.Dataset` 时,可以通过 `tf.data.Dataset.filter` 确保每个批次至少包含一个元素,避免空批次的产生: ```python dataset = tf.data.Dataset.from_tensor_slices([[1, 2], [3, 4]]) dataset = dataset.filter(lambda x: tf.size(x) > 0) ``` 4. **调试与日志记录** 使用 `tf.print` 或 Python 的 `print` 函数(在未启用 `tf.function` 时)输出张量的形状,有助于快速定位问题所在: ```python @tf.function def debug_slice(tensor): tf.print("Tensor shape:", tf.shape(tensor)) return tensor[-1] ``` ### 总结 该错误的核心在于尝试访问空张量的负索引位置。通过在切片操作前检查张量的形状,或在数据预处理阶段确保数据的有效性,可以有效避免此类问题。此外,在图模式下应使用 `tf.cond` 来实现条件判断,以保证程序的健壮性可移植性。 ---
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值