[ICASSP 2025]BrainChat: Interactive Semantic Information Decoding from fMRI Using Large-Scale Vision

论文网址:BrainChat: Interactive Semantic Information Decoding from fMRI Using Large-Scale Vision-Language Pretrained Models | IEEE Conference Publication | IEEE Xplore

论文代码:GitHub - HuangWanqiu/BrainChat-Code: BrainChat: Decoding Semantic Information from fMRI using Vision-language Pretrained Models

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Method

2.4. Experiment

2.4.1. Implementation

2.4.2. fMRI Captioning Evaluation

2.4.3. fQA Evaluation

2.4.4. Adapting BrainChat without Image Data

2.5. Conclusion

3. Reference

1. 心得

(1)你就是我命定的paper吗?先不能如此下结论,很怀疑代码“key code”到底有没有我想要的东西

2. 论文逐段精读

2.1. Abstract

        ①Task: encode the semantic information of fMRI

2.2. Introduction

        ①Task of BrainChat:

        ②Integrated techniques: Contrastive Captioner (CoCa) and Masked Brain Modeling (MBM)

        ③Training mode: encoder/decoder pretraning and regression decoding

aphasia  n.失语症;失语(症)

2.3. Method

        ①Framework of BrainChat:

        ②Pre-training stage: pretraining encoder f_\theta and decoder by MBM. Patchify fMRI data and set masked patches to 0. Reconstruct masked data by mean squared error (MSE).

        ③The encodered fMRI data is project to align with image/text embedding extracted by frozen image encoder g_\theta and text encoder q_\theta from CoCa

        ④fMRI-image contrastive loss L_{fi}, fMRI-text contrastive loss L_{ft}:

L_{fi}=-(\underbrace{\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}g_{\boldsymbol{\theta}}(v_{i})/\sigma)}{\sum_{j=1}^{N}\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}g_{\boldsymbol{\theta}}(v_{j}))}}_{\mathrm{fMRI-to-imange}})+\underbrace{\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(g_{\boldsymbol{\theta}}(v_{i})^{T}p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))/\sigma)}{\sum_{j=1}^{N}\exp(g_{\boldsymbol{\theta}}(v_{i})^{T}f_{\boldsymbol{\theta}}(b_{j})/\sigma)}))}_{\mathrm{image-to-fMRI}}

L_{ft}=-(\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}q_{\boldsymbol{\theta}}(t_{i})/\sigma)}{\sum_{j=1}^{N}\exp(p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))^{T}q_{\boldsymbol{\theta}}(t_{j})/\sigma)})+\underbrace{\frac{1}{N}(\sum_{i}^{N}\log\frac{\exp(q_{\boldsymbol{\theta}}(t_{i})^{T}p_{\boldsymbol{\theta}}(f_{\boldsymbol{\theta}}(b_{i}))/\sigma)}{\sum_{j=1}^{N}\exp(q_{\boldsymbol{\theta}}(t_{i})^{T}f_{\boldsymbol{\theta}}(b_{j})/\sigma)}))}_{\text{text-6-6MRI}}

where (b_{i},v_{i},t_{i}) denotes paired fMRI-images-text data, p_{\theta}(f_{\theta}(b_{i})),g_{\theta}(v_{i}) and p_{\theta}(t_{i}) is the embeddings of the fMRI, image and text in the i-th pair, N is batch size and \sigma is temperature parameter

        ⑤Caption loss L_{cap}:

L_{cap}=-\sum_{k=1}^T\log P_\theta\left(t_k|t<k,b\right)

where P_{\theta}(t_{k}|t<k,b) represents the probability of generating text t_k conditioned on text from previous time steps t< k and the fMRI data b

        ⑥Total loss:

L_{BrainChat}=\lambda_{fi}L_{fi}+\lambda_{ft}L_{ft}+\lambda_{\mathrm{Cap}}L_{cap}

where \lambdas are weights

        ⑦fMRI captioning task: input the first k text words in caption and predict text word at time k+1

        ⑧fQA task: the question is set as text encoder, e.g. "Question: What color is the water? Answer:"

        ⑨Caption generation (ignores image encoder and corresponding loss):

where greens are generated caption and reds are gramma error

2.4. Experiment

2.4.1. Implementation

        ①Dataset 1: subject 1 of Natural Scenes Dataset (NSD), including 15,724 voxel and captions from COCO for fMRI captioning. NSD and VQA datasets are combined to achieve fQA task

        ②Dataset 2: HCP for predict masked fMRI in pre-training stage

        ③Encder and decoder: ViT

        ④Mask ratio: 0.75

        ⑤Hyper parameters during pre-training: 5e-10 learning rate with 0.05 weight decay, AdamW with \beta _1=0.9 and \beta _2=0.95 with NativeScaler gradient scaling

        ⑥Hyper parameters during brain decoder: 1e-4 learning rate with 0.1 weight decay

        ⑦Loss weight: 20 and 1 for caption loss and constractive loss

2.4.2. fMRI Captioning Evaluation

        ①Captioning performance:

        ②Quantity measurement of fMRI captioning:

2.4.3. fQA Evaluation

        ①fQA performance:

        ②Results of fQA:

where greens are answers

2.4.4. Adapting BrainChat without Image Data

        ①没图像表现也ok

2.5. Conclusion

        ~

3. Reference

@INPROCEEDINGS{10889434,
  author={Huang, Wanqiu and Ma, Ke and Xie, Tingyu and Wang, Hongwei},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={BrainChat: Interactive Semantic Information Decoding from fMRI Using Large-Scale Vision-Language Pretrained Models}, 
  year={2025},
  volume={},
  number={},
  pages={1-5},
  keywords={Training;Semantics;Functional magnetic resonance imaging;Signal processing;Brain modeling;Question answering (information retrieval);Decoding;Data mining;Speech processing;Software development management;fMRI question answering;fMRI captioning;fMRI decoding;large-scale vision-language model;human-computer interaction},
  doi={10.1109/ICASSP49660.2025.10889434}}

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值