我现在试图包含一个嵌入层我sequence-to-sequence autoencoder,建造与keras功能的API。
模型代码是这样的:
<span style="color:#393318"><code><span style="color:#858c93">#Encoder inputs</span><span style="color:#303336">
encoder_inputs </span><span style="color:#303336">=</span> <span style="color:#2b91af">Input</span><span style="color:#303336">(</span><span style="color:#303336">shape</span><span style="color:#303336">=(</span><span style="color:#101094">None</span><span style="color:#303336">,))</span>
<span style="color:#858c93">#Embedding</span><span style="color:#303336">
embedding_layer </span><span style="color:#303336">=</span> <span style="color:#2b91af">Embedding</span><span style="color:#303336">(</span><span style="color:#303336">input_dim</span><span style="color:#303336">=</span><span style="color:#303336">n_tokens</span><span style="color:#303336">,</span><span style="color:#303336"> output_dim</span><span style="color:#303336">=</span><span style="color:#7d2727">2</span><span style="color:#303336">)</span><span style="color:#303336">
encoder_embedded </span><span style="color:#303336">=</span><span style="color:#303336"> embedding_layer</span><span style="color:#303336">(</span><span style="color:#303336">encoder_inputs</span><span style="color:#303336">)</span>
<span style="color:#858c93">#Encoder LSTM</span><span style="color:#303336">
encoder_outputs</span><span style="color:#303336">,</span><span style="color:#303336"> state_h</span><span style="color:#303336">,</span><span style="color:#303336"> state_c </span><span style="color:#303336">=</span><span style="color:#303336"> LSTM</span><span style="color:#303336">(</span><span style="color:#303336">n_hidden</span><span style="color:#303336">,</span><span style="color:#303336"> return_state</span><span style="color:#303336">=</span><span style="color:#101094">True</span><span style="color:#303336">)(</span><span style="color:#303336">encoder_embedded</span><span style="color:#303336">)</span><span style="color:#303336">
lstm_states </span><span style="color:#303336">=</span> <span style="color:#303336">[</span><span style="color:#303336">state_h</span><span style="color:#303336">,</span><span style="color:#303336"> state_c</span><span style="color:#303336">]</span>
<span style="color:#858c93">#Decoder Inputs</span><span style="color:#303336">
decoder_inputs </span><span style="color:#303336">=</span> <span style="color:#2b91af">Input</span><span style="color:#303336">(</span><span style="color:#303336">shape</span><span style="color:#303336">=(</span><span style="color:#101094">None</span><span style="color:#303336">,))</span>
<span style="color:#858c93">#Embedding</span><span style="color:#303336">
decoder_embedded </span><span style="color:#303336">=</span><span style="color:#303336"> embedding_layer</span><span style="color:#303336">(</span><span style="color:#303336">decoder_inputs</span><span style="color:#303336">)</span>
<span style="color:#858c93">#Decoder LSTM</span><span style="color:#303336">
decoder_lstm </span><span style="color:#303336">=</span><span style="color:#303336"> LSTM</span><span style="color:#303336">(</span><span style="color:#303336">n_hidden</span><span style="color:#303336">,</span><span style="color:#303336"> return_sequences</span><span style="color:#303336">=</span><span style="color:#101094">True</span><span style="color:#303336">,</span><span style="color:#303336"> return_state</span><span style="color:#303336">=</span><span style="color:#101094">True</span><span style="color:#303336">,</span> <span style="color:#303336">)</span><span style="color:#303336">
decoder_outputs</span><span style="color:#303336">,</span><span style="color:#303336"> _</span><span style="color:#303336">,</span><span style="color:#303336"> _ </span><span style="color:#303336">=</span><span style="color:#303336"> decoder_lstm</span><span style="color:#303336">(</span><span style="color:#303336">decoder_embedded</span><span style="color:#303336">,</span><span style="color:#303336"> initial_state</span><span style="color:#303336">=</span><span style="color:#303336">lstm_states</span><span style="color:#303336">)</span>
<span style="color:#858c93">#Dense + Time</span><span style="color:#303336">
decoder_dense </span><span style="color:#303336">=</span> <span style="color:#2b91af">TimeDistributed</span><span style="color:#303336">(</span><span style="color:#2b91af">Dense</span><span style="color:#303336">(</span><span style="color:#303336">n_tokens</span><span style="color:#303336">,</span><span style="color:#303336"> activation</span><span style="color:#303336">=</span><span style="color:#7d2727">'softmax'</span><span style="color:#303336">),</span><span style="color:#303336"> input_shape</span><span style="color:#303336">=(</span><span style="color:#101094">None</span><span style="color:#303336">,</span> <span style="color:#101094">None</span><span style="color:#303336">,</span> <span style="color:#7d2727">256</span><span style="color:#303336">))</span>
<span style="color:#858c93">#decoder_dense = Dense(n_tokens, activation='softmax', )</span><span style="color:#303336">
decoder_outputs </span><span style="color:#303336">=</span><span style="color:#303336"> decoder_dense</span><span style="color:#303336">(</span><span style="color:#303336">decoder_outputs</span><span style="color:#303336">)</span><span style="color:#303336">
model </span><span style="color:#303336">=</span> <span style="color:#2b91af">Model</span><span style="color:#303336">([</span><span style="color:#303336">encoder_inputs</span><span style="color:#303336">,</span><span style="color:#303336"> decoder_inputs</span><span style="color:#303336">],</span><span style="color:#303336"> decoder_outputs</span><span style="color:#303336">)</span><span style="color:#303336">
model</span><span style="color:#303336">.</span><span style="color:#303336">compile</span><span style="color:#303336">(</span><span style="color:#303336">loss</span><span style="color:#303336">=</span><span style="color:#7d2727">'categorical_crossentropy'</span><span style="color:#303336">,</span><span style="color:#303336"> optimizer</span><span style="color:#303336">=</span><span style="color:#7d2727">'rmsprop'</span><span style="color:#303336">,</span><span style="color:#303336"> metrics</span><span style="color:#303336">=[</span><span style="color:#7d2727">'accuracy'</span><span style="color:#303336">])</span></code></span>
模型训练是这样的:
<span style="color:#393318"><code><span style="color:#303336">model</span><span style="color:#303336">.</span><span style="color:#303336">fit</span><span style="color:#303336">([</span><span style="color:#303336">X</span><span style="color:#303336">,</span><span style="color:#303336"> y</span><span style="color:#303336">],</span><span style="color:#303336"> X</span><span style="color:#303336">,</span><span style="color:#303336"> epochs</span><span style="color:#303336">=</span><span style="color:#303336">n_epoch</span><span style="color:#303336">,</span><span style="color:#303336"> batch_size</span><span style="color:#303336">=</span><span style="color:#303336">n_batch</span><span style="color:#303336">)</span></code></span>
X和y有一个形状(n_samples n_seq_len)
模型的编译工作的完美,而在火车,我将永远得到:
ValueError:错误检查目标:预期time_distributed_1 与形状三维,但有数组(n_samples n_seq_len)