BERT
采用Masked LM + Next Sentence Prediction作为pre-training tasks,基于Transformer的encoding的语言表征模型
Transformer

Self-attention

Multi-head Self-attention
Positional Enconding
加入 e 连接任意位置信息

Training of BERT
Masked LM

Next Sentence Prediction

How to use BERT





Reference
http://www/camdemy.com