图像分类,物体检测,语义分割,实例分割等概念

图像分类 Image Classification

The task of object classification requires binary labels indicating whether objects are present in an image.[1] 图像分类,该任务需要我们对出现在某幅图像中的物体做标注。比如一共有1000个物体类,对一幅图中所有物体来说,某个物体要么有,要么没有。可实现:输入一幅测试图片,输出该图片中物体类别的候选集。

物体检测 Object detection

Detecting an object entails both stating that an object belonging to a specified class is present, and localizing it in the image. The location of an object is typically represented by a bounding box. 物体检测,包含两个问题,一是判断属于某个特定类的物体是否出现在图中;二是对该物体定位,定位常用表征就是物体的边界框。可实现:输入测试图片,输出检测到的物体类别和位置。

语义分割 Semantic scene labeling

The task of labeling semantic objects in a scene requires that each pixel of an image be labeled as belonging to a category, such as sky, chair, floor, street, etc. In contrast to the detection task, individual instances of objects do not need to be segmented. 语义标注/分割:该任务需要将图中每一点像素标注为某个物体类别。同一物体的不同实例不需要单独分割出来。对下图,标注为人,羊,狗,草地。而不需要羊1,羊2,羊3,羊4,羊5.

实例分割 Instance segment

实例分割是物体检测+语义分割的综合体。相对物体检测的边界框,实例分割可精确到物体的边缘;相对语义分割,实例分割可以标注出图上同一物体的不同个体(羊1,羊2,羊3…)
这里写图片描述
4种任务的数据集标注示例如图示。可以看到,标注越来越复杂,但是处理效果越来越有用。

Object Segmentation

one of the reasons that this has fallen out of favor in the research community is because it is problematically vague. Object segmentation used to simply mean finding a single or small number of objects in an image and draw a boundary around them, and for most purposes you can still assume it means this. However, it also began to be used to mean segmentation of blobs that might be objects, segmentation of objects from the background (more commonly now called background subtraction or background segmentation or foreground detection), and even in some cases used interchangeably with object recognition using bounding boxes (this quickly stopped with the advent of deep neural network approaches to object recognition, but beforehand object recognition could also mean simply labeling an entire image with the object in it).

What makes “segmentation” “semantic”?

Simpy, each segment, or in the case of deep methods each pixel, is given a class label based on a category. Segmentation in general is just the division of the image by some rule. Meanshift segmentation, for example, from a very high level divide the data according to the changes in the energy of the image. Graph cut based segmentation is similarly not learned but directly derived from the properties of each image separate from the rest. More recent (neural network based) methods use pixels that are labeled to learn to identify the local features which are associated with specific classes, and then classify each pixel based on which class has the highest confidence for that pixel. In this way, “pixel-labeling” is actually more honest name for the task, and the “segmentation” component is emergent.

Instance Segmentation

Arguably the most difficult, relevant, and original meaning of Object Segmentation, “instance segmentation” means the segmentation of the individual objects within a scene, regardless of if they are the same type. However, one of the reason this is so difficult is because from a vision perspective (and in some ways a philosophical one) what makes an “object” instance is not entirely clear. Are body parts objects? Should such “part-objects” be segmented at all by an instance segmentation algorithm? Should they be only segmented if they are seen separate from the whole? What about compound objects should two things clearly adjoined but separable be one object or two (is a rock glued to the top of a stick an ax, a hammer, or just a stick and a rock unless properly made?). Also, it isn’t clear how to distinguish instances. Is a will a separate instance from the other walls it is attached to? What order should instances be counted in? As they appear? Proximity to the viewpoint? In spite of these difficulties, segmentation of objects is still a big deal because as humans we interact with objects all the time regardless of their “class label” (using random objects around you as paper weights, sitting on things that are not chairs), and so some dataset do attempt to get at this problem, but the main reason there isn’t much attention given to the problem yet is because it isn’t well enough defined.
这里写图片描述

Scene Parsing/Scene labeling

Scene Parsing is the strictly segmentation approach to scene labeling, which also has some vagueness problems of its own. Historically, scene labeling meant to divide the entire “scene” (image) up into segments and give them all a class label. However, it was also used to mean giving class labels to areas of the image without explicitly segmenting them. With respect to segmentation, “semantic segmentation” does not imply dividing the entire scene. For semantic segmentation, the algorithm is intended to segment only the objects it knows, and will be penalized by its loss function for labeling pixels that don’t have any label. For example the MS-COCO dataset is a dataset for semantic segmentation where only some objects are segmented.
这里写图片描述

### 计算机视觉任务的区别与联系 #### 图像分类 图像分类旨在将整张图片分配到预定义的一系列类别中的某一个。这项任务关注的是全局的理解,即判断一张图像是什么类别的整体表示[^1]。 ```python import torch.nn as nn class ImageClassifier(nn.Module): def __init__(self, num_classes=10): super(ImageClassifier, self).__init__() # 定义网络结构... def forward(self, x): # 前向传播逻辑... return output_class_label ``` #### 对象检测 对象检测不仅限于给出图像的整体标签,而是定位并识别图像内的多个物体的位置和种类。这通常通过边界框来实现,每个框对应着特定的对象实例,并附带该对象所属的类别信息。 #### 目标跟踪 目标跟踪专注于随时间序列(通常是视频流)追踪指定的目标位置变化情况。此过程依赖先前帧的信息,在连续的画面间保持对同一目标的身份确认以及坐标更新[^2]。 #### 语义分割 语义分割的任务是对输入图像中的每一个像素点赋予相应的类别标签,从而形成一幅完整的场景解析图。这里强调的是对于空间布局细节上的精确描述,而非仅仅是区分不同的个体实体[^3]。 #### 实例分割 实例分割可以视为结合了对象检测语义分割特点的一种高级形式。除了要完成如同前者那样精确定位各个独立存在的具体事物之外,还需要像后者一样细致地区分这些被检出物体内各部分之间的差异。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值