abstract
This paper studies the problem of learning image semantic segmentation networks only using image-level labels as annotation efforts. Recent state-of-the-art methods on this problem first infer the sparse and discriminative regions for each object class using a deep classification network, then train semantic a segmentation network using the discriminative regions as supervision. Inspired by the traditional image segmentation methods of seeded region growing, we propose to train a semantic segmentation network starting from the discriminative regions and progressively increase the pixel-level supervision using by seeded region growing. The seeded region growing module is integrated in a deep segmentation network and can benefit from deep features. Different from conventional deep networks which have fixed/static labels, the proposed weakly-supervised network generates new labels using the contextual information within an image. The proposed method significantly outperforms the weakly-supervised semantic segmentation methods using static labels, and obtains the state-of-the-art performance, which are 63.2% mIoU score on the PASCAL VOC 2012 test set and 26.0% mIoU score on the COCO dataset.
approach
Seed generation with classification network
使用CAMs来获取初始seed的位置信息:修改VGG-16的conv7,使用global average pooling代替全连接层,最后使用一个hard threshold来从CAMs得到的heatmap种得到目标区域。
除了前景中的种子线索外,我们还可以在背景中找到种子线索。在背景定位方面,我们利用了这篇文章中的显着性检测技术,在归一化显着性映射中选择像素值较低的区域作为背景。从前景和背景得到的种子线索被叠加成一个单一的通道分割掩码。
seeding loss
ℓ seed = − 1 ∑ c ∈ C ∣ S c ∣ ∑ c ∈ C ∑ u ∈ S c log H u , c − 1 ∑ c ∈ C ‾ ∣ S c ∣ ∑ c ∈ C ∑ u ∈ S c log H u , c \begin{aligned} \ell_{\text {seed}}=&-\frac{1}{\sum_{c \in \mathcal{C}}\left|S_{c}\right|} \sum_{c \in \mathcal{C}} \sum_{u \in S_{c}} \log H_{u, c} \\ &-\frac{1}{\sum_{c \in \overline{\mathcal{C}}}\left|S_{c}\right|} \sum_{c \in \mathcal{C}} \sum_{u \in S_{c}} \log H_{u, c} \end{aligned} ℓseed=−∑c∈C∣Sc∣1c∈C∑