WS-DAN:Weakly Supervised Data Augmentation Netowrk for Fine-Grained Visual Classification

本文提出一种弱监督数据增强网络(WS-DAN),通过生成注意力图表示对象的判别部分,并指导数据增强过程,包括注意力裁剪和注意力丢弃,以提高细粒度视觉分类的效率和准确性。

See Better Before Looking Closer: Weakly Supervised Data Augmentation Netowrk for Fine-Grained Visual Classification
Paper PDF

Abstract

In practice, random data augmentation, such as random image cropping, is low-efficiency and might introduce many uncontrolled background noises. In this paper, they propose Weakly Supervised Data Augmentation Network (WS-DAN) to explore the potential of data augmentation. Specifically, for each training image, we first generate attention maps to represent the object’s discriminative parts by weakly supervised learning. Next, we augment the image guided by these attention maps, including attention cropping and attention dropping. The proposed WS-DAN improves the classification accuracy in two folds. In the first stage, images can be seen better since more discriminative parts’ features will be extracted. In the second stage, attention regions provide accurate location of object, which ensures our model to look at the object closer and further improve the performance.

In summary, the main contributions of this work are:

  1. They propose Weakly Supervised Attention Learning to generate attention maps to represent the spatial distribution of discriminative object’s parts, And use BAP module to get the whole object feature by accumulating the object’s part feature.
  2. Based on attention maps, they propose attention-guided data augmentation to improve the efficiency of data augmentation, including attention cropping and attention dropping. Attention cropping randomly crops and resizes one of the attention part to enhance the local feature representation. Attention dropping randomly erases one of the attention region out of the image in order to encourage the model to extract the feature from multiple discriminative parts.
  3. They utilize attention maps to accurately locate the whole object and enlarge it to further improve the classification accuracy.

Innovation

  1. Bilinear Attention Pooling(BAP)
  2. Attention Regularization
  3. Attention-guided Data Augmentation

Pipeline

在这里插入图片描述
The training process can be divided into two parts: Weakly Supervised Attention Learning and Attention-guided Data Augmentation:

Weakly Supervised Attention Learning

Spatial Representation

Attention maps A which is obtained from Feature maps F by convolutional function f ( ⋅ ) f(\cdot) f() in Equ 1 . Each Attention map A k A_{k} Ak represents one of the object’s part or visual pattern, such as the head of a bird, the wheel of a car or the wing of an aircraft. Attention maps will be utilized to augment training data.
A = f ( F ) = ⋃ k = 1 M A k (1) A=f(F)=\bigcup_{k=1}^{M}A_{k} \tag{1} A=f(F)=k=1MAk(1)

Bilinear Attention Pooling(BAP)

在这里插入图片描述
They propose Bilinear Attention Pooling (BAP) to extract features from these parts are represented by Attention maps. We element-wise multiply feature maps F by each attention map A k A_{k} Ak in order to generate M part feature maps F k F_{k} Fk, as shown in Equ 2
F k = A k ⊙ F ( k = 1 , 2 , . . . , M ) (2) F_{k} = A_{k} \odot F (k = 1, 2, ...,M) \tag{2} Fk=AkF(k=1,2,...,M)(2)

Then, They further extract discriminative local feature by additional feature extraction function g ( ⋅ ) g(\cdot) g() such as Global Average Pooling (GAP), Global Maximum Pooling (GMP) or convolutions, in order to obtain k t h k_{th} kth attention feature f k f_{k} fk.
f k = g ( F k ) (3) f_{k}=g(F_{k}) \tag{3} fk=g(Fk)(3)

Object’s feature is represented by part feature matrix P which is stacked by these part features f k f_{k} fk.

P = ( g ( a 1 ⊙ F ) g ( a 2 ⊙ F ) . . . g ( a M ⊙ F ) ) = ( f 1 f 2 . . . f M ) (4) P=\begin{pmatrix} g(a_{1} \odot F) \\ g(a_{2} \odot F) \\ ... \\ g(a_{M} \odot F) \end{pmatrix} =\begin{pmatrix} f_{1} \\ f_{2} \\ ... \\ f_{M} \end{pmatrix} \tag{4} P=g(a1F)g(a2F)...g(aMF)=f1f2...fM(4)

Attention Regularization

For each fine-grained category, They expect that attention map A k A_{k} Ak can represent the same k t h k_{th} kthobject’s part. They penalize the variances of features that belong to the same object’s part, which means that part feature f k f_{k} fk will get close to the a global feature center c k c_{k} ck and attention map A k A_{k} Ak will be activated in the same k t h k_{th} kth object’s part. The loss function can be represented by L A L_{A} LA in Equ 5.
L A = ∑ k = 1 M ∥ f k − c k ∥ 2 2 (5) L_{A}=\sum_{k=1}^{M}\left \| f_{k} - c_{k} \right \|_{2}^{2} \tag{5} LA=k=1Mfkck22(5)

c k c_{k} ck wil updates by the Equ 6 from initialization zero.
c k ← c k + β ( f k − c k ) (6) c_{k} \leftarrow c_{k} + \beta(f_{k} -c_{k}) \tag{6} ckck+β(fkck)(6)

Attention-guided Data Augmentation

Random image cropping, is low-efficiency and a high percentage of them contain many background noises, which might lower the training efficiency, affect the quality of the extracted features and cancel out its benefits. Using Attention as guideline, the crop images may focus more on the target.

Augmentation Map

For each training image, they randomly choose one of its attention map A k A_{k} Ak to guide the data augmentation process, and normalize it as k t h k_{th} kth Augmentation Map A k ∗ A_{k}^{*} Ak
A k ∗ = A k − m i n ( A k ) m a x ( A k ) − m i n ( A k ) (7) A_{k}^{*} = \frac{A_{k}-min(A_{k})}{max(A_{k})-min(A_{k})} \tag{7} Ak=max(Ak)min(Ak)Akmin(Ak)(7)

Attention Cropping

Crop Mask C k C_{k} Ck from A k ∗ A_{k}^{*} Ak by setting element A k ∗ ( i , j ) A_{k}^{*}(i,j) Ak(i,j) which is greater than threshold θ c \theta_{c} θc to 1, and others to 0, as represented in Equ 8.

C k ( i , j ) = { 1 ,  if  A k ∗ ( i , j ) > θ c 0 , otherwise. (8) C_{k}(i,j)={\begin{cases} 1, & \text{ if } A_{k}^{*}(i,j) > \theta_{c} \\ 0, & \text {otherwise.} \end{cases}} \tag{8} Ck(i,j)={1,0, if Ak(i,j)>θcotherwise.(8)

We then find a bounding box Bk that can cover the whole
selected positive region of C k C_{k} Ck and enlarge this region from raw image as the augmented input data.
在这里插入图片描述

Attention Dropping

To encourage attention maps represent multiple discriminative object’s parts, they propose attention dropping. Specifically, they obtain attention Drop Mask D k D_{k} Dk by setting element A k ∗ ( i , j ) A_{k}^{*}(i,j) Ak(i,j) which is greater than threshold θ d \theta_{d} θd to 0, and others to 1, as shown in Equ 9

D k ( i , j ) = { 1 ,  if  A k ∗ ( i , j ) > θ d 0 , otherwise. (9) D_{k}(i,j)={\begin{cases} 1, & \text{ if } A_{k}^{*}(i,j) > \theta_{d} \\ 0, & \text {otherwise.} \end{cases}} \tag{9} Dk(i,j)={1,0, if Ak(i,j)>θdotherwise.(9)
在这里插入图片描述

Object Localization and Refinement

In the testing process, after the model outputs the coarse-stage classification result and corresponding attention maps for the raw image, we can predict the whole region of the object and enlarge it to predict fine-grained result by the same network model. Object Map A m A_{m} Am that indicates the location of object is calculated by Equ 10.
A m = 1 M ∑ k = 1 M A k (10) A_{m}=\frac{1}{M}\sum_{k=1}^{M} A_{k} \tag{10} Am=M1k=1MAk(10)
The final classification result is averaged by the coarse- grained prediction and fine-grained prediction. The detailed process of Coarse-to-Fine prediction is described as Algorithm below:
在这里插入图片描述

Experiments

Ablation:

在这里插入图片描述

Comparison with random data augmentation

在这里插入图片描述
在这里插入图片描述

在这里插入图片描述

Comparison with Stage-of-the-Art Methods

在这里插入图片描述

在这里插入图片描述
在这里插入图片描述

### 数据高效且弱监督的计算病理学方法 在全切片图像(WSI)的计算病理学中,数据高效的弱监督方法旨在减少对大规模标注数据的需求,同时保持较高的预测准确性。这些方法通过引入特定的技术框架来应对WSI特有的挑战。 #### 多实例学习(MIL) 多实例学习是一种有效的弱监督策略,在处理WSI时尤为有用。具体来说,整个幻灯片被划分为多个小区域或“实例”,而每个滑动窗口对应一个包(bag)。对于给定的一组未标记的小图块,如果其中至少有一个属于阳性类别,则该包被认为是阳性的;反之则为阴性[^1]。 ```python def create_bags(slide_image, patch_size=256): """将整张幻灯片分割成固定大小的小图块""" patches = [] width, height = slide_image.size for i in range(0, width-patch_size+1, patch_size//2): # 使用步长patch_size/2实现重叠采样 for j in range(0, height-patch_size+1, patch_size//2): patch = slide_image.crop((i, j, i+patch_size, j+patch_size)) patches.append(patch) return patches ``` 这种方法允许算法仅依赖于幻灯片级别的标签来进行训练,而不是精确到每一个细胞或者组织结构上的细粒度注解。因此大大降低了人工成本,并提高了模型泛化能力。 #### 训练流程优化 为了进一步提高效率并降低资源消耗,研究者们还设计了一套完整的MIL分类管道: 1. 对每轮迭代中的所有样本执行一次前向传播; 2. 根据得到的结果对同一张幻灯片内部的不同实例进行排序; 3. 只选取排名最高的那个作为代表参与反向传播更新参数[^2]。 这种机制不仅简化了传统意义上的逐像素标注过程,而且能够聚焦最具判别力的部分特征,从而提升整体性能表现。 综上所述,针对全切片图像的数据高效且弱监督计算病理学方案主要依靠多实例学习理论以及精心构建的学习架构共同作用下得以实现。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值