Adapting Phrase-based Machine Translation to Normalise Medical Terms in Social Media Messages
health reports in social media, such as DailyStrength and Twitter, have potential (数据来源)
然而,为了使机器理解并推断用户的健康状况,需要能够识别口语化术语何时涉及特定的医学概念(即文本标准化)。
Moreover, we propose to combine the adapted phrase-based MT technique and the similarity between word vector representations to effectively map a Twitter phrase to a medical concept.
我们提出两种技术来解决这个问题。 对于第一种技术,我们基于phrm的向量表示与每个概念descc的描述的向量表示之间的余弦相似性对目标概念进行排名:
实际上,第二种技术计算相似度得分如下:
MT如下:
整个模型