[ICLR 2025]Toward Generalizing Visual Brain Decoding to Unseen Subjects

论文网址:Toward Generalizing Visual Brain Decoding to Unseen Subjects | OpenReview

论文代码:https://github.com/Xiangtaokong/TGBD}{https://github.com/Xiangtaokong/TGBD

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Methods

2.4.1. Dataset consolidation

2.4.2. Learning paradigm

2.4.3. Generalization performance vs. subject similarity

2.5. Experiments

2.5.1. Implementation details

2.5.2. Main results on generalization performance

2.5.3. The generalization performance with different backbones

2.5.4. generalization vs. subject similarity

2.5.5. Experimental results on the NSD dataset

2.6. Discussion: the Source of Generalization Ability

2.7. Conclusion

1. 心得

(1)样本很多工作量很大

2. 论文逐段精读

2.1. Abstract

        ①Challenge of fMRI decoding: generalization to unseen subjects

        ②Solution: extensive training on large dataset(作者还是觉得所有人应该一起训练泛化性才强

2.2. Introduction

        ①Limitations: subject specific models and scarce training data

        ②The authors batch process brain activity by uniformly upsampling voxels and CLIP encoding

2.3. Related Work

        ①Lists a) fMRI decoding models for image reconstruction; b) cross-subject training and fine-tune model; c) feature alignment

2.4. Methods

2.4.1. Dataset consolidation

        ①Due to limited subject in image-watching tasks, they utilize larger number of subjects in task of watching moive

        ②Commonly used fMRI decoding datasets:

they utilize HCP with 177 subjects and 3127 stimuli frame for each(但作者也说这个是视听的,感觉有弊端?有音频的图像帧和直接看图片的脑信号不一定一样吧,也有在处理语言)

        ③Activity capturing:

- stimuli selection: the last frame i_t of each second.

- original response voxel: v'_t

- response: v_t=\frac{v'_t+...+v'_{t+4}}{5} for 4-second hemodynamic delay

2.4.2. Learning paradigm

        ①NSD general.nii:

        ②Pipeline of model:

where image encoder is CLIP-ViT-L/14

        ③Constrastive loss:

\mathcal{L}=\frac{1}{2N}\left(\sum_{i=1}^{N}-\log\frac{\exp(\sin(F_{I}^{i},F_{V}^{i})/\tau)}{\sum_{j=1}^{N}\exp(\sin(F_{I}^{i},F_{V}^{j})/\tau)}+\sum_{i=1}^{N}-\log\frac{\exp(\sin(F_{V}^{i},F_{I}^{i})/\tau)}{\sum_{j=1}^{N}\exp(\sin(F_{V}^{i},F_{I}^{j})/\tau)}\right)

where F_{I}^{i} and F_{V}^{i} are embeddings of the i-th image and fMRI voxel, \tau denotes temperature parameter, \mathrm{sim}(x,y) denotes the cosine similarity between x and y:

\sin(x,y)=\frac{x\cdot y}{\|x\|\|y\|}

        ④Brain decoding network: MLP

2.4.3. Generalization performance vs. subject similarity

        ①Calculate the embedding similarity to obtain the top ten given subjects that are most similar to the unseen subject:

\mathrm{Rank\_Credit}(S_{t},S_{n})=\\\sum_{i=1}^{I}1(S_{n}\in\mathrm{top\_10\_rank}(Sim\_score_{i,S_{t},S_{j}}\mathrm{~for~}j=1,2\ldots,N))

2.5. Experiments

2.5.1. Implementation details

        ①Optimizer: AdamW with β1=0.9,β2=0.999

        ②Batch size: 300

        ③Learning strategy: OneCycleLR with max learning rate 1e-4

        ④Data split: the same 100 frames for test (1-10 subject each) and other ((177-10)*3127-100) for training (all other subject) on HCP. 

2.5.2. Main results on generalization performance

        ①Results on HCP:

        ②Accuracy curve with increasing training samples:

2.5.3. The generalization performance with different backbones

        ①Backbone ablation:

2.5.4. generalization vs. subject similarity

        ①Due to sex bias, they seperately test subjects with different sex:

2.5.5. Experimental results on the NSD dataset

        ①Training on NSD dataset:

2.6. Discussion: the Source of Generalization Ability

        论证了很多样本的好处

2.7. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值