自然语言处理与TF2/Keras实战
1. 数据预处理与编码
在自然语言处理中,数据预处理是非常重要的一步。我们可以使用TF2来完成文本的编码操作。以下是具体的代码示例:
import tensorflow as tf
train_data = [
"I love deep dish pizza.",
"I also eat vegetarian food.",
"I enjoy garlic every day.",
"I will get coffee later."
]
test_data = [
"Enjoy coffee this morning.",
"Long walks on the beach.",
"Please add cream to my tea."
]
num_words = 1000
oov_token = '<UNK>'
pad_type = 'post'
trunc_type = 'post'
# Tokenize our training data
tokenizer = tf.keras.preprocessing.text.Tokenizer(num_words=num_words, oov_token=oov_token)
tokenizer.fit_on_texts(train_data)
# Get our training data word index
word_index = tokenizer.word_index
# Encode training data sentences into sequences
train_sequences =
超级会员免费看
订阅专栏 解锁全文
881

被折叠的 条评论
为什么被折叠?



