[ECCV 2024]NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and M

论文网址:NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation | SpringerLink

论文代码:NeuroPictor

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.4. Method

2.4.1. fMRI Encoder

2.4.2. Hign-Level Semantic Feature Learning

2.4.3. Low-Level Manipulation Network

2.4.4. Training and Inference

2.5. Experiments

2.5.1. Experimental Setup

2.5.2. Main Results

2.5.3. Ablation Study

2.6. Conclusions

1. 心得

(1)夜间,来到了我最喜欢的ECCV频道

(2)不行了越看越想投ECCV,风格真的好出众啊有种脱口秀的感觉。有一种所有会议都在激战而ECCV独自喝茶的感觉

(3)我rua复旦发个ECCV用了别人整个实验室都不够的卡(手动叠甲(来财来财)

2. 论文逐段精读

2.1. Abstract

        ①fMRI-to-image process is lack of accuracy on detail reconstruction

2.2. Introduction

        ①Process of cognition: see it (watching), say it (understanding). And the process of reconstruction: sorted (fMRI)

        ②Reconstruction steps of NeuroPictor:

        ③哈哈哈哈哈哈哈哈哈晚间有趣频道,点赞儿。作者说他们可以将一个fMRI的高级特征和另外一个fMRI的低级特征拼起来:

得到了劈叉斑马和扭扭人

2.3. Related Work

        ①Traditional diffusion model ignores detail reconstruction

        ②They want utilize conditional diffusion model to control and generate other details

        ③作者想用多个个体预训练来改善单个个体的解码。emmm,挺反其道而行之的,emm就是感觉训练时间略长?

2.4. Method

        ①Pipeline of NeuroPictor:

2.4.1. fMRI Encoder

        ①Pre-processing: inspired by fMRI-PTE, transfering fMRI signal to 2D brain activation with 256×256(就像这样的一个flat map,以下的图来自于fMRI-PTE论文):

        ②Convert fMRI surface map S to latent representation by encoder \mathrm{E}(\cdot):

S^r=\operatorname{E}(S)

where S^{r}\in\mathbb{R}^{L_{r}\times d_{r}}, where L_r denotes the number of token and d_r denotes the feature dimensions

2.4.2. Hign-Level Semantic Feature Learning

        ①The pipeline:

        ②The fMRI to text encoder is constructed by 2 1D Convs,down sampled original input:

\tilde{F}^{txt}=\mathrm{E}_{txt}(\boldsymbol{S}^{r})

where \tilde{F}^{txt}\in\mathbb{R}^{L_{T}\times d_{T}}

        ③Caption feature loss:

\mathcal{L}_{sem}=\frac{1}{L_T}\sum_{i=1}^{L_T}\|\boldsymbol{F}_{(i)}^{txt}-\tilde{\boldsymbol{F}}_{(i)}^{txt}\|_2^2

        ④The auxiliary encoder is constructed by 1D zero conv

        ⑤The final output:

F^{sem}=\tilde{F}^{txt}+F^{au}

2.4.3. Low-Level Manipulation Network

        ①Pipeline:

with 2 Conv 1D layers, MLP and U-Net:

\boldsymbol{F}_0^l=\mathrm{E}_{ft}(\boldsymbol{S}^r)

\boldsymbol{F}^l=\operatorname{E}_{\mathrm{U}}(\boldsymbol{F}_0^l)

        ②Residual addition:

\tilde{\boldsymbol{F}^l}=\boldsymbol{F}_{sd}+\alpha\mathcal{Z}(\boldsymbol{F}^l)

where \boldsymbol{F}_{sd} is stable diffusion block and \mathcal{Z}(\cdot ) denotes zero conv

2.4.4. Training and Inference

        ①Input image size: \mathbf{X}\in \mathbb{R}^{512\times 512}, compress to z_0\in\mathbb{R}^{64\times64\times4}

        ②Diffusion object:

\mathcal{L}_{dif}=\mathbb{E}_{\boldsymbol{z}_{0},\boldsymbol{t},\boldsymbol{S}^{r},\epsilon\sim\mathcal{N}(0,1)}\left[\|\epsilon-\epsilon_{\theta}(\boldsymbol{z}_{t},\boldsymbol{t},\boldsymbol{S}^{r},\boldsymbol{F}^{sem}))\|_{2}^{2}\right]

where \epsilon _\theta denotes LLMN

        ③Final loss:

\mathcal{L}=\mathcal{L}_{dif}+\lambda\mathcal{L}_{sem}

        ④Dataset: NSD

        ⑤对数据集把个人的大约六万七千个fMRI图像进行训练???100k迭代??然后对单体进行60k迭代??有点太豪横了

2.5. Experiments

2.5.1. Experimental Setup

        ①卡:6 个 NVIDIA RTX A6000 GPU 

        ②Batch size: 96

        ③微调的卡:2 个 NVIDIA RTX A6000 GPU

2.5.2. Main Results

        ①Performance:

        ②Reconstruction result:

2.5.3. Ablation Study

        ①Module ablation:

        ②Trade off of high or low sementic feature:

2.6. Conclusions

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值