记录英文文本中的文本清洗内容:
- 缩略词更改
- 拼写校正
- 标点符号
- 符号替换
- 去除空格
def clean_text(text):
"""
Clean text
:param text: the string of text
:return: text string after cleaning
"""
# acronym
text = re.sub(r"can\'t", "can not", text)
text = re.sub(r"cannot", "can not ", text)
text = re.sub(r"what\'s", "what is", text)
text = re.sub(r"What\'s", "what is", text)
text = re.sub(r"\'ve ", " have ", text)
text = re.sub(r"n\'t", " not ", text)
text = re.sub(r"i\'m", "i am ", text)
text = re.sub(r"I\'m", "i am ", text)
text = re.sub(r"\'re", " are ", text)
text = re.sub(r"\'d", " would ", text)
text = re.sub(r"\'
英文文本预处理:文本清洗详解

本文详述了英文文本预处理中的关键步骤,包括缩略词转换、拼写纠正、标点符号处理、特殊符号替换以及多余空格的去除,旨在提升NLP任务的准确性。
最低0.47元/天 解锁文章
288

被折叠的 条评论
为什么被折叠?



