[TPAMI 2025]Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation

部署运行你感兴趣的模型镜像

论文网址:Hyper-YOLO: When Visual Object Detection Meets Hypergraph Computation | IEEE Journals & Magazine | IEEE Xplore

英文是纯手打的!论文原文的summarizing and paraphrasing。可能会出现难以避免的拼写错误和语法错误,若有发现欢迎评论指正!文章偏向于笔记,谨慎食用

目录

1. 心得

2. 论文逐段精读

2.1. Abstract

2.2. Introduction

2.3. Related Work

2.3.1. YOLO Series Object Detectors

2.3.2. DETR Series Object Detectors

2.3.3. Hypergraph Learning Methods

2.4. Hypergraph Computation Empowered Semantic Collecting and Scattering Framework

2.5. Methods

2.5.1. Preliminaries

2.5.2. Hyper-YOLO Overview

2.5.3. Mixed Aggregation Network

2.5.4. Hypergraph-Based Cross-Level and Cross-Position Representation Network

2.5.5. Comparison and Analysis

2.6. Experiments

2.6.1. Experimental Setup

2.6.2. Results and Discussions

2.6.3. Ablation Studies on Backbone

2.6.4. Ablation Studies on Neck

2.6.5. More Ablation Studies

2.6.6. More Evaluation on Instance Segmentation Task

2.6.7. Visualization of High-Order Learning in Object Detection

2.7. Conclusion

1. 心得

(1)省流,清华佬的

(2)重要的图片补药放在补充材料里面啊我们看不到!!!

好想看wwwwww

2. 论文逐段精读

2.1. Abstract

        ①Limitations of traditional YOLO: neck can not efficiently aggregate cross-level feature or utilize the correlation of high order features

        ②Thus, they proposed Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS)

2.2. Introduction

        ①Most of existing works fail to explore the high order relationship between features

        ②Performance:

2.3. Related Work

2.3.1. YOLO Series Object Detectors

        ①List different versions of YOLO and mention that they proposed an improved version

2.3.2. DETR Series Object Detectors

        ①DERT, which based on Transformer, is faster and more accurate than YOLO. However, it has plenty of parameters and performs worse on small object detection

        ②Transformer is similar to graph(感觉有点小共识怎么回事)

        ③Hyper-graph is able to solve the problem of Transformer

2.3.3. Hypergraph Learning Methods

        ①超图捕获高阶关系然后超图在计算机视觉还没有充分探索哈哈哈哈哈哈哈哈

2.4. Hypergraph Computation Empowered Semantic Collecting and Scattering Framework

        ①For feature map \mathbf{X}, hyper graph will construct it to f:\boldsymbol{X}\to\mathcal{G}. Then get the hyper feature map \mathbf{X}_{hyper}\mathbf{X} and \mathbf{X}_{hyper} will be fused to construct the hybrid feature map \mathbf{X}'

        ②Hypergraph Computation Empowered Semantic Collecting and Scattering (HGC-SCS) framework:

\begin{cases} \boldsymbol{X}_{mixed}\xleftarrow{\text{Collecting}}\{\boldsymbol{X}_{1},\boldsymbol{X}_{2},\ldots\} \\ \boldsymbol{X}_{hyper}=\text{HyperComputation}(\boldsymbol{X}_{mixed})//\text{High-Order} \\ \mathrm{Learning} \\ \{\boldsymbol{X}_{1}^{\prime},\boldsymbol{X}_{2}^{\prime},\ldots\}\xleftarrow{\text{Scattering}}\{\phi(\boldsymbol{X}_{hyper},\boldsymbol{X}_{1}),\phi(\boldsymbol{X}_{hyper},\boldsymbol{X}_{2}) \\ ,\ldots\} & \end{cases}

where \phi \left ( \cdot \right ) denotes the feature fusion function

2.5. Methods

2.5.1. Preliminaries

        ①Three scale outputs of the neck: \{N_3,N_4,N_5\}, which are small-scale, medium-scale, and large-scale feature map

        ②5 stages in backbone: \{B_1,B_2,B_3,B_4,B_5\}, the higher number denotes the semantic feature at higher level and deeper layer

2.5.2. Hyper-YOLO Overview

        ①感觉把上一节的内容又说了一下,说自己在那些地方提取特征

2.5.3. Mixed Aggregation Network

        ①The schematic of Mixed Aggregation Network (MANet):

where c in pictures denotes channel number

        ②The processes in MANet:

        ③The final output is fused by all of these feature:

\boldsymbol{X}_{out}=\mathrm{Conv}_o(\boldsymbol{X}_1||\boldsymbol{X}_2||\ldots||\boldsymbol{X}_{4+n})

prowess  n.造诣;高超的技艺;非凡的技能

2.5.4. Hypergraph-Based Cross-Level and Cross-Position Representation Network

        ①Pipeline of proposed Hypergraph-Based Cross-Level and Cross-Position Representation Network (HyperC2Net):

(1)Hypergraph Construction

        ①For hypergraph \mathcal{G}=\{\mathcal{V},\mathcal{E}\}\mathcal{V} denotes node set and \mathcal{E} is hyperedge set

        ②How to build hypergraph:

        ③Edges are screened by \epsilon-ball from each feature point:

\mathcal{E}=\{ball(v,\epsilon)\mid v\in\mathcal{V}\}

where ball(v,\epsilon)=\{u\mid||\boldsymbol{x}_u-\boldsymbol{x}_v||_d<\epsilon,u\in\mathcal{V}\}

(2)Hypergraph Convolution

        ①Hypergraph conv: spatial-domain hypergraph convolution with residual connection:

\left.\left\{ \begin{array} {l}\boldsymbol{x}_e=\frac{1}{|\mathcal{N}_v(e)|}\sum_{v\in\mathcal{N}_v(e)}\boldsymbol{x}_v\boldsymbol{\Theta} \\ \boldsymbol{x}_v^{\prime}=\boldsymbol{x}_v+\frac{1}{|\mathcal{N}_e(v)|}\sum_{e\in\mathcal{N}_e(v)}\boldsymbol{x}_e \end{array}\right.\right.

where \mathcal{N}_v(e)=\{v\mid v\in e,v\in\mathcal{V}\} and \mathcal{N}_e(v)=\{e\mid v\in e,e\in\mathcal{E}\}, where \Theta is trainable parameter

        ②The fomular of hyper graph convolution:

\mathrm{HyperConv}(\boldsymbol{X},\boldsymbol{H})=\boldsymbol{X}+\boldsymbol{D}_v^{-1}\boldsymbol{H}\boldsymbol{D}_e^{-1}\boldsymbol{H}^\top\boldsymbol{X}\boldsymbol{\Theta}

where \boldsymbol{D}_v and \boldsymbol{D}_e denote diagonal degree matrices of the vertices and hyperedges

(3)An Instance of HGC-SCS Framework

        ①Hypergraph-based cross-level and cross-position representation network (HyperC2Net):

\begin{cases} X_{mixed}=B_{1}||B_{2}||B_{3}||B_{4}||B_{5} \\ X_{hyper}=HyperConv(X_{mixed},H) \\ N_{3},N_{4},N_{5}=\phi(X_{hyper},B_{3}),\phi(X_{hyper},B_{4}), \\ \phi(X_{hyper},B_{4}) & \end{cases}

where \parallel denotes concatenation, \phi denotes fusion function

2.5.5. Comparison and Analysis

        ①They change PANet/gather-distribute neck to HyperC2Net

2.6. Experiments

2.6.1. Experimental Setup

        ①Performance on Microsoft COCO dataset:

where different convolutional layers and feature dimension takes different model size, -T (the last C2F in Bottom-Up stage is changed to 1×1 Conv), -N, -S, -M, -L

        ②Fair comparison: no pretraining and self-distillation strategies for all methods

        ③Input of all these models: 640×640 pixels

2.6.2. Results and Discussions

        ①性能好,参数少,小参数模型上性能显著提升

2.6.3. Ablation Studies on Backbone

        ①Ablation studies on backbone:

        ②Ablation studies on kernel size:

2.6.4. Ablation Studies on Neck

        ①Change hypergraph to traditional GCN:

        ②Ablation on feature map:

        ③Ablation on distance threshold:

        ④Ablation on distance:

2.6.5. More Ablation Studies

        ①Model scale ablation:

2.6.6. More Evaluation on Instance Segmentation Task

        ①Performance on instance segmentation:

2.6.7. Visualization of High-Order Learning in Object Detection

        ①Attention changing visualization:

2.7. Conclusion

        ~

您可能感兴趣的与本文相关的镜像

Yolo-v8.3

Yolo-v8.3

Yolo

YOLO(You Only Look Once)是一种流行的物体检测和图像分割模型,由华盛顿大学的Joseph Redmon 和Ali Farhadi 开发。 YOLO 于2015 年推出,因其高速和高精度而广受欢迎

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值