94、自然语言处理：从语法到解析的全面探索-优快云博客

本文链接：https://blog.youkuaiyun.com/rust6ferris/article/details/151887485

自然语言处理：从语法到解析的全面探索

1. 语言模型与样本选择

在语言模型的构建中，样本的选择方式对模型性能有着重要影响。随机选择样本往往能让模型表现得更好，而非由领域专家挑选。通常来说，包含大量特征的 n - gram 和语言模型更擅长生成文本，因为这些特征对语言学习的贡献最大。为了衡量模型的性能，我们可以使用一些样本句子进行训练，通过阅读这些句子，能更好地理解模型的预测方式。

以 GPT - 2 和 CTRL 这两个变压器模型为例，GPT - 2 是一种知名的变压器模型。CTRL 则具有更强的灵活性，例如在生成产品评论时，我们可以指定评分：
- 评分 1.0：“I bought this for my son who is a huge fan of the show. He was so excited to get it and when he opened it, we were all very disappointed. The quality of the product is terrible. It looks like something you would buy at a dollar store.”
- 评分 4.0：“I bought this for my husband and he loves it. He has a small wrist so it is hard to find watches that fit him well. This one fits perfectly.”