Notes_Man2Programmer@Woman2Homemaker

最新推荐文章于 2021-04-07 10:25:54 发布

Yupei_Du

最新推荐文章于 2021-04-07 10:25:54 发布

阅读量247

点赞数

CC 4.0 BY-SA版权

分类专栏： PaperDaily Embeddings

本文链接：https://blog.youkuaiyun.com/vinodyp/article/details/78526103

PaperDaily 同时被 2 个专栏收录

1 篇文章

订阅专栏

Embeddings

1 篇文章

订阅专栏

探讨了英语词汇嵌入中固有的性别偏见问题，并提出了一种去除这些偏见的方法。研究发现，预先训练的词嵌入如word2vec中存在明显的性别偏见，例如将程序员更多地与男性联系起来，而护士则更多地与女性联系。为了解决这个问题，研究人员通过识别并移除性别中立词汇中的性别偏差维度来修正这种偏见。

Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings

introduction

Bias especially gender stereotypes in word embeddings:

e.g. Man - woman = programmer - homemaker

Pretrained embeddings: word2vec / 300dimensions / Google News

Quantify bias:

Compare a word vector to the vectors of a pair of gender-speciﬁc words. for example, nurse close to woman is not bias itself, because nurse close to humans, but closer than man suggest bias.

consider the distinction between gender speciﬁc words that are associated with a gender by deﬁnition (e.g. brother / sister), which close to a specfic gender is not bias, and the remaining gender neutral words (e.g. programmer / nurse).

We will use the gender speciﬁc words to learn a gender subspace ( Surprisingly, there exists a low dimensional subspace in the embedding that captures much of the gender bias.) in the embedding. Removes the bias only from the gender neutral words while respecting gender speciﬁc words.

Gender biases in English

Implicit Association Tests have uncovered gender-word biases that people do not self-report and may not even be aware of. Biases are shown in morphology as well as while there are more words referring to males, there are many more words that sexualize females than males.

Biases in algorithms

A number of online systems have been shown to exhibit various biases.Schmidt identiﬁed the bias present in word embeddings and proposed debiasing by entirely removing multiple gender dimensions. His approach is entirely remove gender from embeddings. At the same time, the difﬁculty of evaluating embedding quality (as compared to supervised learning) parallels the difﬁculty of deﬁning bias in an embedding.

word embeddings

Embeddings form: $w\epsilon{R^d}$ ,||w||=1. Assume F-M pair $\large{P\epsilon}{R^d*R^d}$ , gender neutral word $\large{N \epsilon }{W}$ , similiarity is cosine similarity:

c o s (u, v) = u * v | u | * | v |

$cos(u,v)={\frac{u*v}{|u|*|v|}}$
so similarity between embeddings is

c o s (w 1, w 2) = w 1 * w 2 (2)

$cos(w_1,w_2)=w_1 * w_2(2)$

Crowd experiments

Geometry of Gender and Bias in Word Embeddings

understand biases present in embeddings(i.e which words more close to he/she etc.) and to which extent biases agree with human notion of stereotypes.

Occupational stereotypes

Ask the crowdworkers to evaluate whether an occupation is con-sidered female-stereotypic, male-stereotypic, or neutral. Spearman r=.51(strongly correlated):

the geometric biases of embedding vectors is aligned with crowd judgment.

Analogies exhibiting stereotypes

(To Be Continued…)