[论文精读]Preserving specificity in federated graph learning for fMRI-based neurological disorder

NYU	3T Allegra scanner with repetition time (TR) = 2000 ms, echo time (TE) = 15 ms, voxel size = 3.0×3.0×4.0 mm^3, flip angle = 90°, field-of-view (FOV) = 192×240 mm^2 and a total of 33 slices with a thickness of 4 mm.
UCLA	3T Trio scanner with TR = 3000ms, TE = 28 ms, voxel size = 3.0×3.0×4.0 mm^3, flip angle = 90°, FOV = 192×192 mm^2 and a total of 34 slices with a thickness of 4mm.
UM	3T GE Signa scanner with TR = 2000 ms, TE = 30 ms, voxel size = 3.438×3.438×3.000 mm^3, flip angle = 90° and a total of 40 slices with a thickness of 3 mm.

④Tool: DPARSF

⑤Preprocessing steps: discard the first five volumes, perform head motion correction, spatial smoothing and normalization, bandpass filtering (0.01–0.10 Hz) of BOLD time series, nuisance signals regression, and spatial standardization of the Montreal Neurological Institute (MNI)

⑥Atlas: AAL 116

（2）REST-meta-MDD

①Sites: Site 20, Site 21, and Site 25 (the largest 3)

②Samples: (282+251)+(86+70)+(89+63)=841

③Scanning parameters:

Site 20	3T Trio scanner with a 12-channel receiver coil, with TR = 2000 ms, TE = 30ms, voxel size = 3.44×3.44×4.00mm^ 3, flip angle = 90°, gap = 1.0 mm, FOV = 220×220 mm^2 and a total of 32 slices with a thickness of 3mm
Site 21	3T Trio scanner with a 32-channel receiver coil, with TR = 2000ms, TE = 30 ms, voxel size = 3.12×3.12×4.20 mm^3, flip angle = 90°, gap = 0.7 mm, FOV = 200×200 mm^2 and a total of 33 slices with a thickness of 3.5 mm
Site 25	3T Verio scanner with a 12-channel receiver coil, with TR = 2000 ms, TE = 25 ms, voxel size = 3.75×3.75×4.00 mm^3, flip angle = 90°, gap = 0.0 mm, FOV = 240×240 mm^2 and a total of 36 slices with a thickness of 4 mm.

④Tool: DPARSF

⑤Preprocessing steps: discard the first ten volumes, and employ the same head motion correction, spatial smoothing and normalization, bandpass filtering (0.01–0.10 Hz), nuisance signal regression, and spatial standardization.

⑥Atlas: AAL 116

2.5. Methodology

2.5.1. Shared branch at client side

（1）Dynamic graph sequence construction

①Graph can be noted as $G=(V,E)$ where $V$ is the ROI and $E$ reveals the connection set

②BOLD signal: $B=(b_1,\ldots,b_N)^\top\in R^{N\times D},b_i\in\mathbb{R}^D$ with $N$ ROI and $D$ time points

③They employed slicing window technique. Namely for $\Gamma$ window length and stride of $s$ , the segment number is $T=\left\lfloor\frac{D-\Gamma}{s}+1\right\rfloor$

④The BOLD signal $i$ in the $t$ -th segment is $b_{i} \left(t\right) \in \mathbb{R}^{\Gamma} \left(t=1,2, \ldots,T \right)$

⑤The Pearson correlation of the $i$ -th ROI and the $j$ -th ROI in the $t$ -th segment can be calculated by $p_{ij}\left(t\right)=\frac{\mathrm{Cov}(b_i(t),b_j(t))}{\sigma(b_i(t))\sigma(b_j(t))}$ , where $Cov\left ( \cdot ,\cdot \right )$ denotes the covatiance and $\sigma \left ( \cdot \right )$

is standard deviation. Each FCN is denoted by $P\left(t\right)=\left(p_{ij}\left(t\right)\right)\in\mathbb{R}^{N\times N}\left(t=1,2,\ldots,T\right)$

⑥Node feature matrix: $X\left ( t \right )=P\left ( t \right )$

⑦Sparsify: only retain the top 30% correlation connections in the FCNs and set them as 1:

$A\left(t\right)=\mathbb{I}\left\{p_{ij}\left(t\right)\geq\delta\left(t\right)\right\}{\in}\{0,1\}$

⑧They construct graph for each time window for each subject:

${\mathcal{G}}=\{G\left(t\right)\}_{t=1}^{T}$

$G\left(t\right)=\left(X\left(t\right),A\left(t\right)\right)$

（2）Dynamic graph representation learning

①They choose STAGIN as the backbone（如果没见过的可以参考：[论文精读]Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention-优快云博客）

②Layer of GIN: 2

③SERO（是STAGIN的一个模块）depicted by the authors:

④Pseudo code of SFGL:

2.5.2. Personalized branch at client side

①影像特征：这分支已经非常简单了，使用整个BOLD来求得FC矩阵，然后展平上三角或下三角，再MLP一下。

②非影像特征：把性别年龄和教育串联再一起，再MLP

③Personal feature: concatenate image features and non-image features:

$f_p=f'\oplus p'$

④最后作者把STAGIN得到的和这个personal feature按权重加起来，注意是加起来不是串联了（MLP最后得到的维度一样就能加）：

$f_o=\gamma f_p+(1-\gamma) f_s$

2.5.3. Federated aggregation at server side

①Output of client:

$z^m=\Phi_{\phi_m}\left(\{X^m\left(t\right)\}_{t=1}^T,\{A^m\left(t\right)\}_{t=1}^T\right)$

②Output of client further updated by demographic information $p^m$ and imaging features $f^m$ :

$f_o^m=\Theta_{\theta_m}(f^m,p^m,z^m)$

③Cross entropy loss in each branch:

$L_{c}^{m}=-\sum_{i\in Y^{m}}\left(y_{i}^{m}\log\bigl(g_{i}^{m}\bigr)+\bigl(1-y_{i}^{m}\bigr)\log\bigl(1-g_{i}^{m}\bigr)\bigr)\right.$

④To prevent the gradient explosion or vanishing, they introduced orthogonal constraint loss:

$L_{ortho}^{m}= \left \| \frac{1}{\mu_m} X^{'m^T}X^{'m}-I \right \|_2$

where $\mu_{m}=\max\left(X^{^{\prime}m}{}^{\mathrm{T}}X^{^{\prime}m}\right)$

⑤Total loss can be defined as:

$L^m=L_{c}^m+\lambda L_{ortho}^m$

⑥Parameter update in client-server communication:

$\begin{aligned}\phi_m^{r+1}&\leftarrow\phi_m^r-\eta\nabla L^m\Big(\{X^m (t)\}_{t=1}^T,\{A^m (t)\}_{t=1}^T,f^m,p^m,Y^m\Big)\\\\\theta_m^{r+1}&\leftarrow\theta_m^r-\eta\nabla L^m\Big(\{X^m (t)\}_{t=1}^T,\{A^m (t)\}_{t=1}^T,f^m ,p^m ,Y^m\Big)\end{aligned}$

where $\eta$ denotes the learning rate

⑦Parameters in server:

$\phi^{r+1}=\sum_{m=1}^{M}\frac{n_{m}}{n}\phi_{m}^{r+1}$

2.5.4. Implementation details

①Batch size: 4

②Dropout rate: 0.5

2.6. Experiments

2.6.1. Experimental settings

①Cross validation: 5 fold

②Hyperparameters:

2.6.2. Methods for comparison

①Compared with non-FL and FL strategies respectively.

non-FL methods	cross method (tr_<site>)	one site for training and others for testing
	single method	train and test each dataset by 5-fold cross validation separetely
	mix	data from all sites are mixed
FL methods	FedAvg	different federated aggregation methods, with mean parameters tranfering
	FedProx	average server parameters and L2 norm
	MOON	maximize the cosine similarity of local and global, minimize the cosine similarity of current communication round and previous
	pFedMe	including Moreau envelope
	LGFed	only send the parameters in the last fully connected layer

holistic adj.整体的；全面的；功能整体性的

2.6.3. Experimental results

①Comparison table on ABIDE dataset:

②Comparison table on REST-meta-MDD dataset:

③AUROC curves:

2.6.4. Statistical significance analysis

①Predicted probability distribution between different models:

（感觉拿自己模型和别的比也...没什么说服力....）

2.7. Discussion

2.7.1. Ablation study

①Considering

SFGLw/oPB	with shared branch, without personalized branch, sending all local parameters
SFGLw/oPC	only using demographic information in personalized branch
SFGLw/oDI	only using imaging information in personalized branch

the ablation study will be:

2.7.2. Influence of balancing coefficient

①Grid search on $\gamma =\left \{ 0.1,0.2,...,0.9 \right \}$ :

2.7.3. Influence of local training epoch

①Grid search of local epoch:

作者觉得epoch太小会频繁更新，大了的话会参数漂移（局部和全局最优的错位）

2.7.4. Influence of different backbones in shared branch

①Ablation study with different backbone:

2.7.5. Influence of feature extractor

①Ablation study on different feature extractor:

2.7.6. Interpretable biomarker analysis

①Guided back-propagation gradient formula for interpretability:

$g_k^c=\operatorname{ReLU}\left(\frac{\partial y^c}{\partial x_k}\right)$

②The top 10 disctiminative brain regions:

2.7.7. Convergence analysis

①Total number of communication rounds: 10

②Local epoch: 5

③Record of loss:

2.7.8. Model scalability analysis

①They adding other sites to identify the scalability of their model:

assimilation n.吸收；接受；同化现象；同化

2.7.9. Computation cost analysis

①Computational cost (including the number of model parameters (ParaN), the size of model parameters (ParaS), GPU memory usage during training (GPUMe), floating point operations per second (FLOPs), and the training time for each communication round (TimeC)):

2.7.10. Limitations and future work

①Didn't mention the relationship between subjects

②Personalized branch for each site needed

③More advanced feature fusion strategies required

④Semi-supervised or weak supervised model can be better in the real situation

⑤Weakness in stability and communication efficiency

2.8. Conclusion

Novel in information extraction and federated aggregation approach

4. Reference

Zhang, J. et al. (2024) 'Preserving specificity in federated graph learning for fMRI-based neurological disorder identification', Neural Networks, 169: 584-596. doi: Redirecting