[arXiv 2024]MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model

论文网址:[2405.18812] MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model

论文代码:Page not found · GitHub · GitHub

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. MindSemantix

2.3.1. Pre-training: Self-supervised Brain-Encoder-Decoder (BED)

2.3.2. Training: End-to-end Brain-Language Model (BLM)

2.3.3. MindSemantix for Visual Reconstruction

2.4. Experimental Results

2.4.1. Dataset and Setting

2.4.2. Evaluation Metric

2.4.3. Captioning Results

2.4.4. MindSemantix for Reconstruction

2.4.5. Ablation Study

2.5. Discussion

2.6. Conclusion

1. 心得

(1)代码好像还是coming soon的情况,是打不开的(我反正打不开)(作者arXiv也写的之后公布)

(2)谁在push我!!!我只是一个无忧无虑读论文的博主啊

2. 论文逐段精读

2.1. Abstract

        ①Challenge: brain activity decoding

Decipher  v. 破译,辨认(难认、难解的东西);理解(神秘或难懂的事物)

2.2. Introduction

        ①"captioning could facilitate visual decoding by approximatively simulating the reverse process of perceiving stimuli" 字幕预测为什么是视觉感知刺激逆向过程?感觉在默认人看到图片就一定会内心中有语义的解答,而且有些ADHD的看了不和没看一样(虽然一般不收录这种

        ②"low-level visual processing and high-level semantic processing"为什么看图片刺激是低级视觉刺激?还有高级语义?为什么不认可人的语言功能自动包含了低级语义(比如人已经会语法了)

        ③Recent works utilize ridge regression to extract fMRI feature(这股风到底怎么兴起的!!确实到处都在岭回归)

        ④Pipeline of MindSemantix:

(如果作者只认同高级语义那么这右边的output text不会只是key words吗)

2.3. MindSemantix

        ①Specific pre-training and training processes:

(哥们儿上半和BrainChat好相似啊哈哈哈虽然可能也是常见操作了)(为什么预训练对着mask之后的fMRI进行编码但是到了下面绿框是直接对原始信号进行编码了不会有问题吗!!)

2.3.1. Pre-training: Self-supervised Brain-Encoder-Decoder (BED)

        ①Reason for mask and reconstruction fMRI signal: spatial blurring brought by hemodynamic response and spatial smoothing

        ②L2 loss for brain signal (patch-wise) reconstruction:

\mathcal{L}_{BED}=\sum_{i=1}^{N_{all}}\mathcal{L}_{2}(\mathbf{x}_{i}^{\prime},\mathbf{x}_{i})=\sum_{i=1}^{N}\|\mathbf{x}_{i}-\mathbf{x}^{\prime}\|_{2}

where \mathbf{x}_{i} denotes real fMRI token and \mathbf{x}^{\prime} denotes recovered

2.3.2. Training: End-to-end Brain-Language Model (BLM)

        ①fMRI projector: a fully-connected (FC) layer and a 1D convolutional layer

        ②Text projector: FC in BLIP-2, the same weights

        ③Caption generation loss:

\mathcal{L}_{BLM}=\sum_{i=1}^{N}\mathcal{L}_{OPT}(\mathbf{c}_{i}^{\prime},\mathbf{c}_{i})=\sum_{i=1}^{N}\left[\sum_{j=1}^{M}\mathcal{L}_{OPT}(\mathrm{BLM}(\mathbf{x}_{i}),\mathbf{c}_{ij})\right]

where \mathbf{c}_{i} and \mathbf{c}_{i}^{\prime} is real caption and predicted caption, M is the caption number of each image in COCO dataset (M=5)

2.3.3. MindSemantix for Visual Reconstruction

        ①SD decoder: ridge regressor, to mapping fMRI as a sketch (low level visual information contained)

        ②Visual reconstruction: by stable diffusion

2.4. Experimental Results

2.4.1. Dataset and Setting

        ①Dataset: NSD

        ②Subjects: 1/2/3/7 in total 8

        ③ROI: nsdgeneral

        ④Patch size of fMRI: 16

        ⑤各种预训练和训练的epoch还有什么预热啥的真的不想搬运了好多

2.4.2. Evaluation Metric

        ①Low level metrics: Meteor, Rouge, CIDEr

        ②High level metrics: SPICE, CLIP, Sentence

2.4.3. Captioning Results

        ①Performance of MindSemantix when Gaussian noise added:

        ②Visualization of caption generation:

        ③Quantitative results:

2.4.4. MindSemantix for Reconstruction

        ①Example of image reconstruction:

        ②Comparison of image reconstruction:

        ③Comparison table of quantitative metrics:

2.4.5. Ablation Study

        ①Module ablation on subject 1:

        ②How caption assists image reconstruction:

2.5. Discussion

        ①可以优化生成的字幕,也可以用更好的预训练模型

2.6. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值