[NeurIPS 2023]Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generativ

论文网址:[2306.03089] Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

论文项目和代码:Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.4. Methods

2.4.1. Background on Diffusion Models

2.4.2. Brain-Encoding Model Construction

2.4.3. Brain-Guided Diffusion Model

2.5. Results

2.5.1. Setup

2.5.2. Broad Category-Selective Networks

2.5.3. Individual ROIs

2.5.4. Semantic Divisions within ROIs

2.6. Discussion

3. Reference

1. 心得

(1)是oral捏(看完了,名副其实,是很别致的论文,从实验中看出了作者的大脑在运转)

(2)Intro写的好复杂TUT

(3)哥们儿主图画得有点潦草

(4)非要给线性层取名线性适应层是吧我也能线性感知层线性提取层线性连接层线性萃取层归一线性层

(5)现在的img-brain signal数据集真是被玩出花了,真每篇任务都不一样

(6)实验做的很有新意,但我这里没有实验的详解,感兴趣的可以看看原文

2. 论文逐段精读

2.1. Abstract

        ①They proposed Brain Diffusion for Visual Exploration (BrainDiVE), which generates images by brain signals and proves ROI will perform differently in different visual tasks

2.2. Introduction

        ①Inference that higher visual cortex preferentially process complex semantic categories is come from manully stimuli and specific scene

        ②Maximum voxels of corresponding function of identify different categories. All these images are generated by BrainDiVE:

2.3. Related work

        ①Designed stimuli is different from natural stimuli

        ②Lists deep generative models such as variational autoencoders, generative adversarial networks, flows and score/energy/diffusion models. ⭐But all of these are for reconstruction rather than predicting stimuli(还没往下看,不过预测出会刺激大脑某个地方的图片该怎么去验证真的能刺激呢)

        ③They focus on complex scenes

macaque  n.猕猴

2.4. Methods

        ①Steps of synthesizing images that activate a target brain region:

they only show the scene-selective regions (RSC/PPA/OPA) on the right hemisphere

2.4.1. Background on Diffusion Models

        ①Sampling from data distribution p\left ( x \right ), and generate {x_{T}}\sim\mathcal{N}(0,\mathbb{I})x_{T-1},x_{T-2},x_{T-3}\ldots x_0 step by step, where x_{t}=\sqrt{\alpha_{t}}x_{0}+\epsilon\sqrt{1-\alpha_{t}}

        ②Mean-squared error (MSE) is the loss of autoencoder network \epsilon_{\theta}(x_{t},t)

        ③Autoencoder D_{\Omega}(E_{\Phi}(\mathcal{I})) consists of encoder E_\Phi and decoder D_\Omega

        ④Diffusion model: pretrained latent diffusion model (LDM)

2.4.2. Brain-Encoding Model Construction

        ①Mapping learning:

M_{\theta}(\mathcal{I})\Rightarrow B

where \mathcal{I}\in\mathbb{R}^{3\times H\times W} is image, M_\theta denotes voxel-wise brain encoding model, B\in\mathbb{R}^{N} is the fMRI \beta values with N elements

        ②Components of encoder: the first one is CLIP trained image encoder, which outputs K dimensional vector as latent embedding; the second one is a euclidean normalization and linear adaptation layer W\in\mathcal{R}^{N\times K},b\in\mathcal{R}^{N}

        ③The function of the whole encoding operations:

B\approx M_\theta(\mathcal{I})=W\times\frac{\mathrm{CLIP}_{\mathrm{img}}(\mathcal{I})}{\|\mathrm{CLIP}_{\mathrm{img}}(\mathcal{I})\|_2}+b

2.4.3. Brain-Guided Diffusion Model

        ①Conditioning is done in one of two ways in conventional text-conditioned diffusion models:

  • The first approach modifies the function \epsilon _\theta to further accept a conditioning vector c, resulting in \epsilon_{\theta}(x_{t},t,c).
  • The second approach uses a contrastive trained image-to-concept encoder, and seeks to maximize a similarity measure with a text-to-concept encoder.

        ②Maximizing the average activation of S predicted by M_\theta to condition the diffusion model:

\epsilon_{theta}^{\prime}=\epsilon_{theta}-\sqrt{1-\alpha_{t}}\nabla_{x_{t}}(\frac{\gamma}{|S|}\sum_{i\in S}M_{\theta}(D_{\Omega}(x_{t}^{\prime}))_{i})

where \gamma is a scale, S\subseteq N are the set of voxels used for guidance(为什么是先解码再编码了?不太了解扩散的改进

        ③Employing euler approximation to get images with low noise:

\begin{aligned} & \hat{x}_{0}=\frac{1}{\sqrt{\alpha}}(x_{t}-\sqrt{1-\alpha}\epsilon_{t}) \\ & x_{t}^{\prime}=(\sqrt{1-\alpha})\hat{x}_{0}+(1-\sqrt{1-\alpha})x_{t} \end{aligned}

2.5. Results

2.5.1. Setup

        ①Dataset: the Natural Scenes Dataset (NSD)

        ②Subject: they choose 4 of total 7, cuz S1, S2, S5, and S7 have the watched 10000 natural scene images repeated three times in their entirety

        ③Images in NSD: MS COCO

        ④Feature of fMRI: \beta value calculated by GLMSingle

        ⑤Voxel norm: \mu=0 and \sigma=1 per session

        ⑥Applying average on repeat session of fMRI signal

        ⑦Data split: 9:1 for tr/test

        ⑧啥???每个人在V100上训练了1500个小时???62天???

        ⑨Diffusion base: stable-diffusion-2-1-base, which produces images of 512 × 512 resolution using ϵ-prediction

        ⑩Using multi-step 2nd order DPM-Solver++ with 50 steps and apply 0.75 SAG(什么玩意这是)

        ⑪Step size hyperparameter: \gamma =130.0

        ⑫Brain encoder: ViT-B/16 with 224×224 output size

        ⑬Prompt: null prompt ""

        ⑭CLIP probes: CoCa ViT-L/14 (LAION-2B)

2.5.2. Broad Category-Selective Networks

        ①Category selective voxels:

        ②The top-5 images which have the highest average activation:

        ③Semantic specificity of images generated by BrainDiVE and natrual image:

2.5.3. Individual ROIs

        ①Generation ability of OFA and FFA:

        ②Performance:

2.5.4. Semantic Divisions within ROIs

        ①Clustering within the food ROI and within OPA:

        ②Visualization of generated image of sub class S1 and S2:

        ③Subsets of OPA:

        ④Performance:(表的title很清楚了我就不描述了)

2.6. Discussion

        ~

3. Reference

@article{luo2023brain,
  title={Brain Diffusion for Visual Exploration: Cortical Discovery using Large Scale Generative Models},
  author={Luo, Andrew F and Henderson, Margaret M and Wehbe, Leila and Tarr, Michael J},
  journal={arXiv preprint arXiv:2306.03089},
  year={2023}
}
}
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值