[ICLR 2025]MindSimulator: Exploring Brain Concept Localization via Synthetic FMRI

论文网址:[2503.02351] MindSimulator: Exploring Brain Concept Localization via Synthetic FMRI

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Works

2.4. Method

2.4.1. Motivation and Overview

2.4.2. fMRI Autoencoder

2.4.3. Diffusion Estimator

2.4.4. Inference Sampler

2.5. Experiments Setup

2.5.1. Datasets

2.5.2. Implementation Details

2.5.3. Evaluation Metrics

2.6. Results

2.6.1. Evaluation for Synthetic fMRI

2.6.2. Out-of-Distribution Generalization

2.6.3. Ablation

2.7. Localizing Concept-Selective Regions

2.7.1. Predict Empirical Regions

2.7.2. Exploring Novel Regions

2.8. Conclusion

1. 心得

(1)看似简单,实则复杂的一篇论文。哈哈,很奇妙,有点像越嚼越香的干面包

(2)脑信号编码和解码领域论文最近真的很各有千秋啊

2. 论文逐段精读

2.1. Abstract

        ①Brain stimuli by specific object is subjective

        ②They proposed MindSimulator to locate concept-selective regions

2.2. Introduction

        ①Specific conception (such as places, bodies, faces, words, colors, and foods) will stimulate different corresponding cortical regions

        ②Limitations: limited data, bias of artificially selected stimuli, and isolated objects in unnatural scenes(这个似乎就和上一篇论文相反,上一篇觉得就应该单独看一个物体只认知这个物体,但这一篇认为物体不应该独立于场景而存在。可能就是上下文认知不同吧。)

2.3. Related Works

        ①Lists fMRI encoding/decoding and generative methods

2.4. Method

2.4.1. Motivation and Overview

        ①⭐The same stimuli will bring differet fMRI recording. 因此一个刺激会对应多个fMRI记录。通过回归模型加工的刺激只能得到单一的fMRI记录,但生成模型更为随机。所以作者采用生成模型

        ②Overview of model:\left ( x,y \right )

2.4.2. fMRI Autoencoder

        ①Paired data \left ( x,y \right ) is from sample \mathcal{S}, where {x}\in\mathbb{R}^{l} is preprocessed BOLD signal and y is corresponding visual stimuli (从很后面返回到这里,这个一维是因为展平了)

        ②Voxel encoder {\mathcal{E}}(\cdot) embeds x to \mathcal{X}=\mathcal{E}(x)\in\mathbb{R}^{m\times d} with higher dimension

        ③Voxel decoder {\mathcal{D}}(\cdot) decodes \mathcal{X} back to fMRI voxel \hat{x}=\mathcal{D}(\mathcal{X})\in\mathbb{R}^{l}

        ④Loss of autoencoder block: 

\mathcal{L}_{\mathrm{mse}}=\mathbb{E}_{x\sim\mathcal{S}}||x-\hat{x}||_{2}^{2}

        ⑤They use pre-trained CLIP-ViT {\mathcal{V}}(\cdot) to align fMRI and stimuli \mathcal{Y}=\mathcal{V}(y)\in\mathbb{R}^{m\times d}

        ⑥Loss of align:

\begin{gathered} \mathcal{L}_{\mathrm{softclip}}=-\frac{1}{|\mathcal{S}|}\sum_{i=1}^{|\mathcal{S}|}\sum_{j=1}^{|\mathcal{S}|}\left[\frac{\exp(\mathcal{X}_{i}\cdot\mathcal{X}_{j}/\tau)}{\sum_{k=1}^{|\mathcal{S}|}\exp(\mathcal{X}_{i}\cdot\mathcal{X}_{k}/\tau)}\cdot\log\left(\frac{\exp(\mathcal{Y}_{i}\cdot\mathcal{X}_{j}/\tau)}{\sum_{k=1}^{|\mathcal{S}|}\exp(\mathcal{Y}_{i}\cdot\mathcal{X}_{k}/\tau)}\right)\right] \\ -\frac{1}{|\mathcal{S}|}\sum_{i=1}^{|\mathcal{S}|}\sum_{j=1}^{|\mathcal{S}|}\left[\frac{\exp(\mathcal{Y}_{i}\cdot\mathcal{Y}_{j}/\tau)}{\sum_{k=1}^{|\mathcal{S}|}\exp(\mathcal{Y}_{i}\cdot\mathcal{Y}_{k}/\tau)}\cdot\log\left(\frac{\exp(\mathcal{X}_{i}\cdot\mathcal{Y}_{j}/\tau)}{\sum_{k=1}^{|\mathcal{S}|}\exp(\mathcal{X}_{i}\cdot\mathcal{Y}_{k}/\tau)}\right)\right]. \end{gathered}

        ⑦Autoencoder loss:

\mathcal{L}_{\text{autoencoder}}=\mathcal{L}_{\mathrm{mse}}+\mathcal{L}_{\mathrm{softclip}}

        ⑧Diffusion estimator {\mathcal{P}}(\cdot) is Transformer with cross-attention

2.4.3. Diffusion Estimator

        ①They designed a diffusion estimator \mathcal{P}(\cdot) with T time steps to obtain the noised fMRI representation:

\mathcal{Z}_{t}^{\mathcal{X}}=\sqrt{\bar{\alpha_{t}}}\cdot\mathcal{X}+\sqrt{1-\bar{\alpha_{t}}}\cdot\epsilon,\bar{\alpha_{t}}=\prod_{m=1}^{t}\alpha_{m},t\sim[1,2,\cdots,T]

where \alpha_m denotes noise schedule hyperparameter and \epsilon\sim\mathcal{N}(0,1) denotes Gaussian noise

        ②Learning objective:

\mathcal{L}_{\mathrm{diffusion}}=\mathbb{E}_{\epsilon,t,(x,y)\sim\mathcal{S}}[\|\mathcal{P}(\mathcal{Z}_{t}^{\mathcal{X}},\mathcal{Y},\mathcal{T}_{t})-\mathcal{X}\|_{2}^{2}]

2.4.4. Inference Sampler

        ①Predicted fMRI representation:

\hat{Z}_{t-1}^{\mathcal{X}}=\mathcal{P}(\hat{Z}_{t}^{\mathcal{X}},\mathcal{Y},\mathcal{T}_{t}),\hat{\mathcal{Z}}_{T}^{\mathcal{X}}\sim\mathcal{N}(0,1),\hat{\mathcal{X}}=\hat{\mathcal{Z}}_{0}^{\mathcal{X}}

        ②They generate N fMRI signals by N noise and take the average of N fMRI

        ③Noise generation: randomly sample two independent noise \epsilon_{1}\sim\mathcal{N}(0,1) and \epsilon_{2}\sim\mathcal{N}(0,1), then generates others:

\epsilon_n=\sqrt{\beta_n}\cdot\epsilon_1+\sqrt{1-\beta_n}\cdot\epsilon_2,n\in[1,2,\cdots,N]

2.5. Experiments Setup

2.5.1. Datasets

        ①Dataset: Natural Scenes Dataset (NSD)

        ②Subject: 8

        ③Image/stimuli set: MSCOCO

        ④Session: 3 with 10000 images each, then obtain 30000 fMRI signals for one sbject

        ⑤Selected subject: 1, 2, 5, 7 for their complete experiment

        ⑥Data split: 9000 for training and 1000 for testing

        ⑦⭐作者在训练的时候将一个被试的三次session当作三个数据,但在测试时会综合一个刺激对应的三个fMRI刺激并取平均作为结果

        ⑧The authors utilized the GLMSingle tool to compute the beta-activations for each voxel, which reflect the strength of the brain's response to specific stimuli, and normalized these activations.

        ⑨Brain atlas: 官方自动分割的,不是现有的模板

2.5.2. Implementation Details

        ①Image extractor: CLIP ViT-L/14 with 257×768 dimension

        ②Voxel encoder: MLPs and residual networks, voxel decoder is the opposite

        ③Optimizer: AdamW

        ④Epoch: 300 for fMRI autoencoder and 150 for diffusion

        ⑤Cycle learning rate: starting from 3e-4

        ⑥Diffusion estimator: \mathcal{T}=100 adopting a cosine noise schedule and 0.2 conditions drop

        ⑦Diffusion network: 6 Transformer blocks with 257 image tokens, 257 noised fMRI tokens, and 1 time embedding each

        ⑧\beta is randomly sampled in (0,1)

2.5.3. Evaluation Metrics

        ①⭐Pearson correlation, voxel-wise mean square error (MSE), and R-squared cannot reflect performance accurately:

        ②Evaluation method:

2.6. Results

2.6.1. Evaluation for Synthetic fMRI

        ①Performance table:

        ②Image reconstruction:

2.6.2. Out-of-Distribution Generalization

        ①Semantic-level performance when MindSimulator generates to other image only datasets:

        ②Image reconstruction performance:

2.6.3. Ablation

        ①Module ablation:

2.7. Localizing Concept-Selective Regions

2.7.1. Predict Empirical Regions

        ①The empirical findings of faces-, bodies-, places-, and wordsselective regions in NSD fLoc:

        ②Subset of the top 100 image categories selected by pre-trained CLIP model:

        ③Localization evaluation of places- and bodies-selective regions:

2.7.2. Exploring Novel Regions

        ①Localized concept-selective regions according to synthetic fMRI:

较低的视觉皮层对颜色和形状具有选择性,而较高的视觉皮层则对特定的概念具有选择性

        ②Reconstruction after mask:

2.8. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值