[ECCV 2024]NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and M-优快云博客

本文链接：https://blog.youkuaiyun.com/Sherlily/article/details/149142842

论文网址：NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation | SpringerLink

论文代码：NeuroPictor

英文是纯手打的！论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误，若有发现欢迎评论指正！文章偏向于笔记，谨慎食用

2.4.2. Hign-Level Semantic Feature Learning

2.4.3. Low-Level Manipulation Network

2.4.4. Training and Inference

2.5. Experiments

2.5.1. Experimental Setup

2.5.2. Main Results

2.5.3. Ablation Study

2.6. Conclusions

1. 心得

（1）夜间，来到了我最喜欢的ECCV频道

（2）不行了越看越想投ECCV，风格真的好出众啊有种脱口秀的感觉。有一种所有会议都在激战而ECCV独自喝茶的感觉

（3）我rua复旦发个ECCV用了别人整个实验室都不够的卡（手动叠甲（来财来财）

2. 论文逐段精读

2.1. Abstract

①fMRI-to-image process is lack of accuracy on detail reconstruction

2.2. Introduction

①Process of cognition: see it (watching), say it (understanding). And the process of reconstruction: sorted (fMRI)

②Reconstruction steps of NeuroPictor:

③哈哈哈哈哈哈哈哈哈晚间有趣频道，点赞儿。作者说他们可以将一个fMRI的高级特征和另外一个fMRI的低级特征拼起来：

得到了劈叉斑马和扭扭人

2.3. Related Work

①Traditional diffusion model ignores detail reconstruction

②They want utilize conditional diffusion model to control and generate other details

③作者想用多个个体预训练来改善单个个体的解码。emmm，挺反其道而行之的，emm就是感觉训练时间略长？

2.4. Method

①Pipeline of NeuroPictor:

2.4.1. fMRI Encoder

①Pre-processing: inspired by fMRI-PTE, transfering fMRI signal to 2D brain activation with 256×256（就像这样的一个flat map，以下的图来自于fMRI-PTE论文）:

②Convert fMRI surface map $S$ to latent representation by encoder $\mathrm{E}(\cdot)$ :

$S^r=\operatorname{E}(S)$

where $S^{r}\in\mathbb{R}^{L_{r}\times d_{r}}$ , where $L_r$ denotes the number of token and $d_r$ denotes the feature dimensions

2.4.2. Hign-Level Semantic Feature Learning

①The pipeline:

②The fMRI to text encoder is constructed by 2 1D Convs,down sampled original input:

$\tilde{F}^{txt}=\mathrm{E}_{txt}(\boldsymbol{S}^{r})$

where $\tilde{F}^{txt}\in\mathbb{R}^{L_{T}\times d_{T}}$

③Caption feature loss:

$\mathcal{L}_{sem}=\frac{1}{L_T}\sum_{i=1}^{L_T}\|\boldsymbol{F}_{(i)}^{txt}-\tilde{\boldsymbol{F}}_{(i)}^{txt}\|_2^2$

④The auxiliary encoder is constructed by 1D zero conv

⑤The final output:

$F^{sem}=\tilde{F}^{txt}+F^{au}$

2.4.3. Low-Level Manipulation Network

①Pipeline:

with 2 Conv 1D layers, MLP and U-Net:

$\boldsymbol{F}_0^l=\mathrm{E}_{ft}(\boldsymbol{S}^r)$

$\boldsymbol{F}^l=\operatorname{E}_{\mathrm{U}}(\boldsymbol{F}_0^l)$

②Residual addition:

$\tilde{\boldsymbol{F}^l}=\boldsymbol{F}_{sd}+\alpha\mathcal{Z}(\boldsymbol{F}^l)$

where $\boldsymbol{F}_{sd}$ is stable diffusion block and $\mathcal{Z}(\cdot )$ denotes zero conv

2.4.4. Training and Inference

①Input image size: $\mathbf{X}\in \mathbb{R}^{512\times 512}$ , compress to $z_0\in\mathbb{R}^{64\times64\times4}$

②Diffusion object:

$\mathcal{L}_{dif}=\mathbb{E}_{\boldsymbol{z}_{0},\boldsymbol{t},\boldsymbol{S}^{r},\epsilon\sim\mathcal{N}(0,1)}\left[\|\epsilon-\epsilon_{\theta}(\boldsymbol{z}_{t},\boldsymbol{t},\boldsymbol{S}^{r},\boldsymbol{F}^{sem}))\|_{2}^{2}\right]$

where $\epsilon _\theta$ denotes LLMN

③Final loss:

$\mathcal{L}=\mathcal{L}_{dif}+\lambda\mathcal{L}_{sem}$

④Dataset: NSD

⑤对数据集把个人的大约六万七千个fMRI图像进行训练？？？100k迭代？？然后对单体进行60k迭代？？有点太豪横了

2.5. Experiments

2.5.1. Experimental Setup

①卡：6 个 NVIDIA RTX A6000 GPU

②Batch size: 96

③微调的卡：2 个 NVIDIA RTX A6000 GPU

2.5.2. Main Results

①Performance:

②Reconstruction result:

2.5.3. Ablation Study

①Module ablation:

②Trade off of high or low sementic feature: