[论文精读]Preserving specificity in federated graph learning for fMRI-based neurological disorder

论文原名:Preserving specificity in federated graph learning for fMRI-based neurological disorder identification

论文代码:ZJH123333/SFGL:SFGL 框架的源代码 (github.com)

论文网址:Preserving specificity in federated graph learning for fMRI-based neurological disorder identification - ScienceDirect

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 省流版

1.1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related work

2.3.1. Graph learning for functional MRI analysis

2.3.2. Federated learning for brain disease analysis

2.4. Materials and data preprocessing

2.4.1. Materials

2.4.2. Data preprocessing

2.5. Methodology

2.5.1. Shared branch at client side

2.5.2. Personalized branch at client side

2.5.3. Federated aggregation at server side

2.5.4. Implementation details

2.6. Experiments

2.6.1. Experimental settings

2.6.2. Methods for comparison

2.6.3. Experimental results

2.6.4. Statistical significance analysis

2.7. Discussion

2.7.1. Ablation study

2.7.2. Influence of balancing coefficient

2.7.3. Influence of local training epoch

2.7.4. Influence of different backbones in shared branch

2.7.5. Influence of feature extractor

2.7.6. Interpretable biomarker analysis

2.7.7. Convergence analysis

2.7.8. Model scalability analysis

2.7.9. Computation cost analysis

2.7.10. Limitations and future work

2.8. Conclusion

4. Reference


1. 省流版

1.1. 心得

(1)怎么讲呢,看开头觉得模型上没什么创新点,联邦学习上的改进也只是加入了一些非图像因素

(2)扫描参数是官方给的吧?意思是直接抄进来了?

(3)每个站点单独的数据还是有点少

2. 论文逐段精读

2.1. Abstract

        ①Crossing site fMRI analysis faces data privacy, security problems and storage burden

        ②Previous FL methods in brain disease classification ignored site-specificity

2.2. Introduction

        ①There are privacy problems/hazard in multi-site imaging data analysis 

        ②They proposed a specificity-aware federated graph learning (SFGL) for diagnosing diseases by fMRI:

        ③Datasets: ABIDE and REST-meta-MDD

2.3. Related work

2.3.1. Graph learning for functional MRI analysis

        ①Listing spatial and temporal graph methods which applied to mental disease diagnosis

        ②They further point out the privacy problems

2.3.2. Federated learning for brain disease analysis

        ①Listing examples of federated learning on brain disease diagnosis

        ②但作者觉得那些方法没考虑demographic factors (i.e., age, gender, and education)。。。

2.4. Materials and data preprocessing

2.4.1. Materials

        ①Demographic data on two datasets:

2.4.2. Data preprocessing

(1)ABIDE I

        ①Sites: the New York University (NYU), the University of California, Los Angeles (UCLA), and the University of Michigan (UM) (the largest 3)

        ②Samples: (74+98)+(48+37)+(47+73)=377

        ③Scanning parameters: 

NYU3T Allegra scanner with repetition time (TR) = 2000 ms, echo time (TE) = 15 ms, voxel size = 3.0×3.0×4.0 mm^3, flip angle = 90°, field-of-view (FOV) = 192×240 mm^2 and a total of 33 slices with a thickness of 4 mm.
UCLA3T Trio scanner with TR = 3000ms, TE = 28 ms, voxel size = 3.0×3.0×4.0 mm^3, flip angle = 90°, FOV = 192×192 mm^2 and a total of 34 slices with a thickness of 4mm.
UM3T GE Signa scanner with TR = 2000 ms, TE = 30 ms, voxel size = 3.438×3.438×3.000 mm^3, flip angle = 90° and a total of 40 slices with a thickness of 3 mm.

        ④Tool: DPARSF

        ⑤Preprocessing steps: discard the first five volumes, perform head motion correction, spatial smoothing and normalization, bandpass filtering (0.01–0.10 Hz) of BOLD time series, nuisance signals regression, and spatial standardization of the Montreal Neurological Institute (MNI)

        ⑥Atlas: AAL 116

(2)REST-meta-MDD

        ①Sites: Site 20, Site 21, and Site 25 (the largest 3)

        ②Samples: (282+251)+(86+70)+(89+63)=841

        ③Scanning parameters: 

Site 203T Trio scanner with a 12-channel receiver coil, with TR = 2000 ms, TE = 30ms, voxel size = 3.44×3.44×4.00mm^ 3, flip angle = 90°, gap = 1.0 mm, FOV = 220×220 mm^2 and a total of 32 slices with a thickness of 3mm
Site 213T Trio scanner with a 32-channel receiver coil, with TR = 2000ms, TE = 30 ms, voxel size = 3.12×3.12×4.20 mm^3, flip angle = 90°, gap = 0.7 mm, FOV = 200×200 mm^2 and a total of 33 slices with a thickness of 3.5 mm
Site 253T Verio scanner with a 12-channel receiver coil, with TR = 2000 ms, TE = 25 ms, voxel size = 3.75×3.75×4.00 mm^3, flip angle = 90°, gap = 0.0 mm, FOV = 240×240 mm^2 and a total of 36 slices with a thickness of 4 mm.

        ④Tool: DPARSF

        ⑤Preprocessing steps: discard the first ten volumes, and employ the same head motion correction, spatial smoothing and normalization, bandpass filtering (0.01–0.10 Hz), nuisance signal regression, and spatial standardization.

        ⑥Atlas: AAL 116

2.5. Methodology

2.5.1. Shared branch at client side

(1)Dynamic graph sequence construction

        ①Graph can be noted as G=(V,E) where V is the ROI and E reveals the connection set

        ②BOLD signal: B=(b_1,\ldots,b_N)^\top\in R^{N\times D},b_i\in\mathbb{R}^D with N ROI and D time points

        ③They employed slicing window technique. Namely for \Gamma window length and stride of s, the segment number is T=\left\lfloor\frac{D-\Gamma}{s}+1\right\rfloor

        ④The BOLD signal i in the t-th segment is b_{i} \left(t\right) \in \mathbb{R}^{\Gamma} \left(t=1,2, \ldots,T \right)

        ⑤The Pearson correlation of the i-th ROI and the j-th ROI in the t-th segment can be calculated by p_{ij}\left(t\right)=\frac{\mathrm{Cov}(b_i(t),b_j(t))}{\sigma(b_i(t))\sigma(b_j(t))}, where Cov\left ( \cdot ,\cdot \right ) denotes the covatiance and \sigma \left ( \cdot \right )

 is standard deviation. Each FCN is denoted by P\left(t\right)=\left(p_{ij}\left(t\right)\right)\in\mathbb{R}^{N\times N}\left(t=1,2,\ldots,T\right)

        ⑥Node feature matrix: X\left ( t \right )=P\left ( t \right )

        ⑦Sparsify: only retain the top 30% correlation connections in the FCNs and set them as 1:

A\left(t\right)=\mathbb{I}\left\{p_{ij}\left(t\right)\geq\delta\left(t\right)\right\}{\in}\{0,1\}

        ⑧They construct graph for each time window for each subject:

{\mathcal{G}}=\{G\left(t\right)\}_{t=1}^{T} 

G\left(t\right)=\left(X\left(t\right),A\left(t\right)\right)

(2)Dynamic graph representation learning

        ①They choose STAGIN as the backbone(如果没见过的可以参考:[论文精读]Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention-优快云博客

        ②Layer of GIN: 2

        ③SERO(是STAGIN的一个模块)depicted by the authors:

        ④Pseudo code of SFGL:

2.5.2. Personalized branch at client side

        ①影像特征:这分支已经非常简单了,使用整个BOLD来求得FC矩阵,然后展平上三角或下三角,再MLP一下。

        ②非影像特征:把性别年龄和教育串联再一起,再MLP

        ③Personal feature: concatenate image features and non-image features:

f_p=f'\oplus p'

        ④最后作者把STAGIN得到的和这个personal feature按权重加起来,注意是加起来不是串联了(MLP最后得到的维度一样就能加):

f_o=\gamma f_p+(1-\gamma) f_s

2.5.3. Federated aggregation at server side

        ①Output of client:

z^m=\Phi_{\phi_m}\left(\{X^m\left(t\right)\}_{t=1}^T,\{A^m\left(t\right)\}_{t=1}^T\right)

        ②Output of client further updated by demographic information p^m and imaging features f^m:

f_o^m=\Theta_{\theta_m}(f^m,p^m,z^m)

        ③Cross entropy loss in each branch:

L_{c}^{m}=-\sum_{i\in Y^{m}}\left(y_{i}^{m}\log\bigl(g_{i}^{m}\bigr)+\bigl(1-y_{i}^{m}\bigr)\log\bigl(1-g_{i}^{m}\bigr)\bigr)\right.

        ④To prevent the gradient explosion or vanishing, they introduced orthogonal constraint loss:

L_{ortho}^{m}= \left \| \frac{1}{\mu_m} X^{'m^T}X^{'m}-I \right \|_2

where \mu_{m}=\max\left(X^{^{\prime}m}{}^{\mathrm{T}}X^{^{\prime}m}\right)

        ⑤Total loss can be defined as:

L^m=L_{c}^m+\lambda L_{ortho}^m

        ⑥Parameter update in client-server communication:

\begin{aligned}\phi_m^{r+1}&\leftarrow\phi_m^r-\eta\nabla L^m\Big(\{X^m (t)\}_{t=1}^T,\{A^m (t)\}_{t=1}^T,f^m,p^m,Y^m\Big)\\\\\theta_m^{r+1}&\leftarrow\theta_m^r-\eta\nabla L^m\Big(\{X^m (t)\}_{t=1}^T,\{A^m (t)\}_{t=1}^T,f^m ,p^m ,Y^m\Big)\end{aligned}

where \eta denotes the learning rate

        ⑦Parameters in server:

\phi^{r+1}=\sum_{m=1}^{M}\frac{n_{m}}{n}\phi_{m}^{r+1}

2.5.4. Implementation details

        ①Batch size: 4

        ②Dropout rate: 0.5

2.6. Experiments

2.6.1. Experimental settings

        ①Cross validation: 5 fold

        ②Hyperparameters:

2.6.2. Methods for comparison

        ①Compared with non-FL and FL strategies respectively.

non-FL methodscross method (tr_<site>)one site for training and others for testing
single methodtrain and test each dataset by 5-fold cross validation separetely
mixdata from all sites are mixed
FL methodsFedAvgdifferent federated aggregation methods, with mean parameters tranfering
FedProxaverage server parameters and L2 norm
MOONmaximize the cosine similarity of local and global, minimize the cosine similarity of current communication round and previous
pFedMeincluding Moreau envelope
LGFedonly send the parameters in the last fully connected layer

holistic  adj.整体的;全面的;功能整体性的

2.6.3. Experimental results

        ①Comparison table on ABIDE dataset:

        ②Comparison table on REST-meta-MDD dataset:

        ③AUROC curves:

2.6.4. Statistical significance analysis

        ①Predicted probability distribution between different models:

(感觉拿自己模型和别的比也...没什么说服力....)

2.7. Discussion

2.7.1. Ablation study

        ①Considering

SFGLw/oPBwith shared branch, without personalized branch, sending all local parameters
SFGLw/oPConly using demographic information in personalized branch
SFGLw/oDIonly using imaging information in personalized branch

the ablation study will be:

2.7.2. Influence of balancing coefficient

        ①Grid search on \gamma =\left \{ 0.1,0.2,...,0.9 \right \}:

2.7.3. Influence of local training epoch

        ①Grid search of local epoch:

作者觉得epoch太小会频繁更新,大了的话会参数漂移(局部和全局最优的错位)

2.7.4. Influence of different backbones in shared branch

        ①Ablation study with different backbone:

2.7.5. Influence of feature extractor

        ①Ablation study on different feature extractor:

2.7.6. Interpretable biomarker analysis

        ①Guided back-propagation gradient formula for interpretability:

g_k^c=\operatorname{ReLU}\left(\frac{\partial y^c}{\partial x_k}\right)

        ②The top 10 disctiminative brain regions:

2.7.7. Convergence analysis

        ①Total number of communication rounds: 10

        ②Local epoch: 5

        ③Record of loss:

2.7.8. Model scalability analysis

        ①They adding other sites to identify the scalability of their model:

assimilation  n.吸收;接受;同化现象;同化

2.7.9. Computation cost analysis

        ①Computational cost (including the number of model parameters (ParaN), the size of model parameters (ParaS), GPU memory usage during training (GPUMe), floating point operations per second (FLOPs), and the training time for each communication round (TimeC)): 

2.7.10. Limitations and future work

        ①Didn't mention the relationship between subjects

        ②Personalized branch for each site needed

        ③More advanced feature fusion strategies required

        ④Semi-supervised or weak supervised model can be better in the real situation

        ⑤Weakness in stability and communication efficiency

2.8. Conclusion

        Novel in information extraction and federated aggregation approach

4. Reference

Zhang, J. et al. (2024) 'Preserving specificity in federated graph learning for fMRI-based neurological disorder identification', Neural Networks, 169: 584-596. doi: Redirecting

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值