[TMI 2024]Disentangle Then Calibrate With Gradient Guidance: A Unified Framework for Common and Rare

论文网址:Disentangle Then Calibrate With Gradient Guidance: A Unified Framework for Common and Rare Disease Diagnosis | IEEE Journals & Magazine | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. FSL Techniques

2.3.2. Rare Disease Diagnosis

2.3.3. Relationship With Multi-Task Learning

2.4. Method

2.4.1. Overview

2.4.2. GND Module

2.4.3. GFC Module

2.4.4. Summary

2.5. Experiments

2.5.1. Dataset

2.5.2. Implementation

2.5.3. Comparing to Existing Methods

2.5.4. Ablation Study

2.6. Discussion

2.6.1. Hyper-Parameters

2.6.2. Different Strategies for Feature Calibration

2.6.3. Visualization of Feature Embedding

2.6.4. Rare Disease Diagnosis With Different K Values

2.6.5. Rare Disease Simulation and Data Limitation

2.6.6. Future Work

2.7. Conclusion

1. 心得

(1)康康小样本

(2)这个图画的好数学/信号类,挺好看的,极繁主义者狂喜

(3)不评价创新性,不是这边直接小方向的

(4)一开始以为会很难上手?扫一眼属于公式很多+主图复杂的。但实际上写的非常清晰明了,感觉是稍微有矩阵/传输算法基础的就能get到意味的,哪怕不了解具体算法到底怎么实现但是还是能清楚知道作者用什么算法做了什么事儿,理由也看起来非常nice。而且图画的非常清楚

(5)是作者之前投MICCAI工作的延伸。读下来是觉得哪怕不看之前的工作也可以读得懂的行文

2. 论文逐段精读

2.1. Abstract

        ①Limitation: a) scarse data of rare disease; b) few-shot learning (FSL) only increases the performance on rare diseases but cannot performs well on both rare disease and common disease

        ②So they proposed the Disentangle then Calibrate with Gradient Guidance (DCGG) framework

2.2. Introduction

        ①Existing FSL methods: self-supervised learning (SSL), meta-learning, or metric-learning techniques

        ②⭐作者认为现有的三种办法会让模型对罕见病更敏感:比如SSL虽然在常见病上训练但是在罕见病上微调了;元学习也是学习了小样本的罕见病;指标学习通过比较指标来分类(我没有了解过这个诶,感觉很像原型学习,学习典型?)。然后可能会导致,比如在某个数据集里面,健康的人有1000个,常见病患者1000个,然后罕见病50个,模型结果是分类罕见病精度高但常见病就不高了?倒是一个很有意思的研究问题。能不能罕见病增强来copy paste成1000个啊哈哈哈哈哈哈哈但这样有点太假了

2.3. Related Work

2.3.1. FSL Techniques

        ①Model-driven methods, meta-learning and metric-learning, always rely on training strategy rather than data itself

        ②Data-driven methods, such as data augmentation, transfer feature from majority class to minority class

2.3.2. Rare Disease Diagnosis

        ①Related methods:

2.3.3. Relationship With Multi-Task Learning

        ①Different from multi-task learning (MTL), the authors focus on knowledge transfer and single task

2.4. Method

2.4.1. Overview

        ①The overall framework of DCGG:

        ②For dataset of N image/label pairs \mathcal{D}=\{\mathbf{x}_i,\mathbf{y}_i\}_{i=1}^N with medical image \mathbf{x}_i and one-hot label \mathbf{y}_i

        ③Common-disease subset is represented by \mathcal{D}_c and rare-disease subset is denoted by \mathcal{D}_r

2.4.2. GND Module

        ①Training mini-batch of each common disease and obtaining the gradient g_c^i=\nabla_\theta\mathcal{L}_c^i(\theta) of each one (cross entropy loss).

        ②Getting an average gradient for all the diseases: g_c^\star=\nabla_\theta\mathcal{L}_c^\star(\theta)

        ③Mapping each disease to the average space:

\begin{aligned} PL[j] & =\sum_{i=1}^{N_c}PL_{g_c^i[j]\to g_c^\star[j]} \\ & =\sum_{i=1}^{N_c}|g_c^i[j]|cos<g_c^i[j],g_c^\star[j]> \end{aligned}

where g_c^i[j] denotes the j-th channel of g_c^i. The channels which have higher value denote the consistency of diseases. Thus they define the highest M channels of PL as the disease-shared channels C^{sh} and the left are disease-specific channels C^{sp}(好新奇的视角是因为我平时不太看这方面的论文吗?)

        ④Gradient for common-disease and rare-disease: g_c=\nabla_\theta\mathcal{L}_c(\theta,\mathcal{D}_c) and g_r=\nabla_\theta\mathcal{L}_r(\theta,\mathcal{D}_r), and they can be further decomposed to:

g_c=\{g_c^{sh},g_c^{sp}\},\quad g_r=\{g_r^{sh},g_r^{sp}\}

        ⑤作者想要进一步优化这两个梯度从而影响决策:⭐在共享通道上,作者想要只让罕见病检测效果好→降低罕见病检测的损失→但不影响常见病。此时,g_r^{sh}作为罕见病的共享通道需要被优化,并且找到的梯度w^{sh}应该和g_r^{sh}同一方向:

min_{w^{sh}}||w^{sh}-g_{r}^{sh}||_{2}^{2},\quad s.t.g_{c}^{T}w^{sh}>0

        ⑥The specific feature of each common disease: g_c^{sp}=\{g_c^{sp\cdot1},\cdots,g_c^{sp\cdot N_c}\}. ⭐在特异性通道上,优化罕见病不能让它受到常见病的影响,因此需要让它与常见病特异性通道正交:

\min_{w^{sp}}\|w^{sp}-g^{sp}\|_2^2,\quad s.t.G^{sp^T}w^{sp}=0.

both of two functions are solved by Gram-Schmidt and Karush-Kuhn-Tucker condition(没学过,但查了查好像就是之前学的线代知识的延伸)

2.4.3. GFC Module

        ①They model P and Q as a discrete uniform distribution over N_c common diseases and N_r rare diseases(所以说这种分布是自己定的?为什么不是高斯分布啥的):

P=\sum_{i=1}^{N_c}\frac{1}{N_c}\delta_{M_c^i},\quad Q=\sum_{j=1}^{N_r}\frac{1}{N_r}\delta_{M_r^j},

        ②For M_c^i and M_r^j, the feature of the i-th common disease and the j-th rare disease at the l-th disease-shared channel, the transfer from common disease to rare disease can be described by optimal transport (OT) ptoblem:

OT(P,Q) = \min_{T} <C,T>\\ s.t. (\vec{T}\vec{1}) = P, (T^T\vec{1}) = Q,

where T\in\mathbb{R}_{\geq0}^{N_c\times N_r} is the transport plan that needs to be solved, and C\in\mathbb{R}_{\geq0}^{N_c\times N_r} is the cost matrix that points out the cost we should pay when linking a common disease to a rare one

        ③They measure the cost by Euclidean distance of gradients:

C_{ij}=\left\|g_c^i-g_r^j\right\|_2^2

        ④Assuming that the common-disease feature in each channel follows a Gaussian distribution, they utilize Sinkhorn to solve the OT problem and update mean and standard variance by:

\mu_{r}^{j^{\prime}}=\frac{N_{c}\sum_{i\in N_{c}}T_{ij}\mu_{c}^{i}+\mu_{r}^{j}}{N_{c}+1},\sigma_{r}^{j^{\prime}}=\frac{N_{c}\sum_{i\in N_{c}}T_{ij}\sigma_{c}^{i}+\sigma_{r}^{j}}{N_{c}+1}.

        ⑤The specific feature will be:

M_r^{j^{\prime}}=\mu_r^{j^{\prime}}+\sigma_r^{j^{\prime}}*\varepsilon_t,\quad\varepsilon_t\in\mathcal{N}(0,1)

2.4.4. Summary

        ①The algorithm:

2.5. Experiments

2.5.1. Dataset

        ①Statistics of datasets:

2.5.2. Implementation

        ①Backbone \mathcal{F}(\theta): WideResNet with detail:

        ②Optimizer: Adam with 0.001 learning rate

        ③Input image size: 112 \times 112

        ④Batch size: 8

        ⑤Cross validation: 5 fold for common disease training, 20% samples in training set were randomly selected to validation. Randomly selected K=1 \, or\, 5 rare disease samples for training and other 20% for validation. The remaining rare disease samples are test set. They executed 4 times non-repetitively sampling for each fold

        ⑥Epoch: 300

2.5.3. Comparing to Existing Methods

        ①Comparison table on 3 datasets:

2.5.4. Ablation Study

        ①Module ablation:

其中I是常见病训练而罕见病微调,II是只有GND,III是只有GFC

2.6. Discussion

2.6.1. Hyper-Parameters

        ①Ablation of shared channel M:

2.6.2. Different Strategies for Feature Calibration

        ①Other transfer methods(不会是excel画的吧我的也长这样,绷):

2.6.3. Visualization of Feature Embedding

        ①t-SNE visualization:

2.6.4. Rare Disease Diagnosis With Different K Values

        ①Ablation of trained rare-disease samples:

2.6.5. Rare Disease Simulation and Data Limitation

        ①嗯~真的只用了很少的样本训练,并且可以在很多罕见病上测试。感觉很不错啊

2.6.6. Future Work

        ①Enhance the generalization

        ②0 shot senario

2.7. Conclusion

        ~

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值