一开始知道Probing这个方向大概就是从Voita的NLP with friends talk吧,当时模糊地理解为是一种"neural network interpretation"的方式,其实这种理解没有错误,只是它只说中了一半,这周读了若干篇paper along this stream, 发现probing还有一个目的是serve as evaluation metrics for representation learning. 不过这就更玄学了,比NLG的evaluation还要玄学。起码NLG人是可以像判作文一样手判的, 而至于一个representation学的好不好nobody knows。这就直接引向了probing里面最为老铁扎心的问题: When a probe achieves high acc on a linguistic task using a representation, can we conclude that the representation encodes linguistic structure, or has the probe just learned the task?
Anyway我还是想引出[Pimental 2020 Pareto Probing]里面对probing的定义,这样一提到probing我就不再只是理解个大概但又说不出个所以然了:
We define in this work as training a supervised classifier (known as a probe) on top of pretrained models’ frozen representations. By analyzing the classifier’s performance, one can access how much ‘knowledge’ the representations contain about language.
From a information-theoretic view, [Pimental 2020 Info-theoretic Probing] sees probing as:
estimating the mutual information between a representation-valued random variable and a linguistic property-valued random variable.
吐血整理一个thread因为之前不了解所以读的很慢整理得也很慢,希望未来某个时刻发现现在花的功夫都能用在神奇的地方!
最后[Rogers 2020 A Primer in BERTology] 汇总了What have we learnt about BERT from the numerous works of probing. 里面有pointer到各种类型的probing / BERTology works很齐全~