Motion-based Segmentation and RecognitionDataset

该数据库提供首个带有语义类别像素级标签的驾驶场景视频集合,包含超过700张手动标注的图像及对应的元数据,适用于评估新兴算法。视频源自行车视角,覆盖丰富多样的目标类别。

摘要生成于 C知道 ,由 DeepSeek-R1 满血版支持, 前往体验 >

Motion-based Segmentation and RecognitionDataset
(this is a draft versionof this page)

       Please cite:  
  (1)Segmentation and RecognitionUsing Structure from Motion Point Clouds, ECCV 2008 (pdf)
Brostow, Shotton, Fauqueur, Cipolla (bibtex)
  (2)Semantic Object Classes inVideo: A High-Definition Ground Truth Database (pdf)
Pattern Recognition Letters (to appear)
Brostow, Fauqueur, Cipolla (bibtex)
    
 Description:   The Cambridge-driving LabeledVideo Database (CamVid) is the first collection of videos with objectclass semantic labels, complete with metadata. The database providesground truth labels that associate each pixel with one of 32 semantic classes.

The database addresses the need for experimental data to quantitativelyevaluate emerging algorithms. While most videos are filmed withfixed-position CCTV-style cameras, our data was captured from theperspective of a driving automobile. The driving scenario increases thenumber and heterogeneity of the observed object
classes.

Over ten minutes of high quality 30Hz footage is being provided, withcorresponding semantically labeled images at 1Hz and in part, 15Hz. TheCamVid Database offers four contributions that are relevant to objectanalysis researchers. First, the per-pixel semantic segmentation ofover 700 images was specified manually, and was then inspected andconfirmed by a second person for accuracy. Second, the high-quality andlarge resolution color video images in the database represent valuableextended duration digitized footage to those interested in drivingscenarios or ego-motion. Third, we filmed calibration sequences for thecamera color response and intrinsics, and computed a 3D camera pose foreach frame in the sequences. Finally, in support of expanding this orother databases, we offer custom-made labeling software for assistingusers who wish to paint precise class-labels for other images andvideos. We evaluated the relevance of the database by measuring theperformance of an algorithm from each of three distinct domains:multi-class object recognition, pedestrian detection, and labelpropagation.
    
 Overview Video:  
Avi, 30 Mb, xVid compressed.(playbacktips or get the free Mac/Windows player.
or
Mpg, 11 Mb, mpeg-1 compressed(more compatible, but lower quality)


 

CamVid Database

(just samples shown. For all thevideos, see below)



 Original Video Sequences: Link to FTP server withvideo files (very big!)
Linkto codecs + utility for extracting frames from those big files

(read the inventory.txt)
 
Labeled Images
(701 so far)
 
Linkto zip file with painted class labels for stills from the videosequences.
Txtfile listing classes and label colors as RGB triples (sorted).
(Note: the corresponding raw input images only - at 1Hz,
already extracted from the respective videos areheretemporarily(556Mb).)
 
Camera extrinsics
  Linkto files and code (if link breaks someday, go here)
The relevant line that you care about to get the projection matrix of 1camera is in MotBoostEvalOneFrame.m (see howLoadBoujou_2Dtrax_3dBans_Misc.m calls it):
curC = Cs( frameNum-offsetForFrameNums,    1:3);
   Examplecamera posetrajectory, stored in Boujou Animation Format:
each line containing "AddDecompCameraKey" has a K and R matrix and tvector,
so that P = K * R * [I -t]
 


   seq06R0

Description: 3030 frames at 30Hz == 1:41 min
Sample Frame           
VideoFilein MXF format *
   
seq16E5

Description: 6120 frames at 30Hz == 3:24 min
Sample Frame      
VideoFiles 1 and 2 inMXF format* (note: these are 2halves of 1 zip file)



seq16E5_15Hz
(see also CamSeq01)

Description: 202 frames at 30Hz == 0:06 min
Sample Frame
VideoFiles 1 and 2 inMXF format * (note: same files asabove, but use a different script)

   
seq05VD

Description: 5130 frames at 30Hz == 2:51 min
Sample Frame
VideoFileinMXF format*
   seq01TP

Description: 3720 frames at 30Hz == 2:04 min
Sample Frame 
VideoFilein MXF format *

    
   
   Listingof (RGB)-Classassignments (alphabetical)      Listingin color-order used by MSRC(with "XX")
  
Moving objects
Animal
Pedestrian
Child
Rolling cart/luggage/pram
Bicyclist
Motorcycle/scooter
Car (sedan/wagon)
SUV / pickup truck
Truck / bus
Train
Misc
Road
Road == drivable surface
Shoulder
Lane markings drivable
Non-Drivable
Ceiling
Sky
Tunnel
Archway
Fixed objects
Building
Wall
Tree
Vegetation misc.
Fence
Sidewalk
Parking block
Column/pole
Traffic cone
Bridge
Sign / symbol
Misc text
Traffic light
Other





Hand-Labeled Frames:


seq06R0

Description: 101 frames at 1Hz == 1:41 min
Sample Frame       PreviewVideo




seq16E5

Description: 204 frames at 1Hz == 3:24 min
Sample Frame       PreviewVideo

seq16E5_15Hz
(see also CamSeq01)

Description: 101 frames at 15Hz == 0:06 min
Sample Frame       PreviewVideo




seq05VD

Description: 101 frames at 1Hz == 1:41 min
Sample Frame       PreviewVideo




seq01TP

Description: 124 frames at 1Hz == 2:04 min
Sample Frame       PreviewVideo










Paint-Stroke Logs of ManualLabeling:

Example log file, whereeachof the user's mouse-strokes was recorded to include:
the class label being applied, size and type of brush orpre-segmentation used, location of each click point and drag-path, andduration for each stroke.





InteractLabeler Software:

InteractLabeler.zipforWindows (3.4Mb)
InteractLabelerDocumentation
InteractLabelerinstructions, as given to volunteers






*MXF format:

This format is like Avi orQuicktime in that it is a wrapper for multimedia files. In our case,just the video channel has data, and is HD format. To decode, use thisutility ( link)along with the scripts provided.



   
   


地形数据测量是许多地貌研究应用程序的基本方面,尤其是那些包括地形监测和地形变化研究的应用程序。然而,大多数测量技术需要相对昂贵的技术或专门的用户监督。 Motion(SfM)摄影测量技术的结构通过允许使用消费级数码相机和高度自动化的数据处理(可以免费使用)减少了这两个限制。因此,SfM摄影测量法提供了快速,自动化和低成本获取3D数据的可能性,这不可避免地引起了地貌界的极大兴趣。在此贡献中,介绍了SfM摄影测量的基本概念,同时也承认了其传统。举几个例子来说明SfM在地貌研究中的应用潜力。特别是,SfM摄影测量为地貌学家提供了一种工具,用于在一定范围内对3-D形式进行高分辨率表征,并用于变化检测。 SfM数据处理的高度自动化既创造了机遇,也带来了威胁,特别是因为用户控制倾向于将重点放在最终产品的可视化上,而不是固有的数据质量上。因此,这项贡献旨在指导潜在的新用户成功地将SfM应用于一系列地貌研究。 关键词:运动结构,近距离摄影测量,智能手机技术,测量系统,表面形态echnology reduces both these constraints by allowing the use of consumer grade digital cameras and highly automated data processing, which can be free to use. SfM photogrammetry therefore offers the possibility of fast, automated and low-cost acquisition of 3-D data, which has inevitably created great interest amongst the geomorphological community. In this contribution, the basic concepts of SfM photogrammetry are presented, whilst recognising its heritage. A few examples are employed to illustrate the potential of SfM applications for geomorphological research. In particular, SfM photogrammetry offers to geomorphologists a tool for high-resolution characterisation of 3-D forms at a range of scales and for change detection purposes. The high level of automation of SfM data processing creates both opportunities and threats, particularly because user control tends to focus upon visualisation of the final product rather than upon inherent data quality. Accordingly, this contribution seeks to guide potential new users in successfully applying SfM for a range of geomorphic studies.
### 基于能量的方法进行图像分割 在计算机视觉领域,基于能量最小化方法的图像分割是一种广泛应用的技术。这类方法通过定义一个能量函数来表示目标对象与背景之间的差异,并试图找到使该能量函数达到极小值的状态。 #### 能量模型构建 对于给定的一幅输入图片 \( I \),假设要将其分为前景区域 \( F \) 和背景区域 \( B \)[^1]。通常情况下,会设计两个主要部分的能量项: - **数据项(Data Term)**: 描述像素属于某个类别可能性的概率分布; 设计合理的似然概率 \( P(I|F) \) 或者 \( P(I|B) \),用于衡量特定像素更倾向于成为前景还是背景的一部分[^2]。 - **平滑项(Smoothness Term)*: 鼓励相邻像素具有相同的标签;即如果两个邻近位置上的颜色相似,则它们很可能同属一类。 \[ E_{smooth}(L_i,L_j)=\begin{cases} w(i,j)\cdot d(L_i-L_j)^2 & L_i=L_j \\ M & otherwise \end{cases} \] 其中 \( L_i \) 表示第 i 个节点(像素点) 的标签, \( w(i,j) \) 是权重系数矩阵元素,\(d(\cdot )\) 度量距离度量函数,而M是一个很大的正数用来惩罚不同类别的连接关系. #### 图割算法(Graph Cut Algorithm) 为了求解上述最优化问题,常用的一种手段就是图切割法(graph cut algorithm) 。这种方法把整个过程转化为在一个加权无向图上寻找最优划分的问题,在此过程中利用最大流/最小割理论有效地解决了二元标记下的全局最优解搜索难题。 具体来说,建立一张由超级源s和汇t构成的人工网络G=(V,E),并将原始图像中的每一个像素映射成图的一个顶点v∈V。接着按照一定规则设置边e∈E及其容量c(e): - 对每一对相连的像素除去自环外都连一条双向边; - 如果某像素被预测为前景则从s到对应结点赋予较大流量;反之亦然; 最后运行Ford-Fulkerson或其他高效的最大流计算程序即可得到最终结果. ```python import numpy as np from scipy import ndimage from skimage.segmentation import slic from skimage.future.graph import rag_mean_color from sklearn.cluster import KMeans def energy_based_segmentation(image): """ Perform an energy based segmentation on the input image. Parameters: image : ndarray Input color or grayscale image. Returns: labels : ndarray of int Integer array where each unique value represents a different segment. """ # Generate superpixels using SLIC segments_slic = slic( image, n_segments=250, compactness=10, sigma=1, start_label=1 ) g = rag_mean_color(image, segments_slic, mode='similarity') kmeans = KMeans(n_clusters=2) features = [] for node in sorted(g.nodes()): features.append([g.node[node]['total color']]) cluster_labels = kmeans.fit_predict(features)+1 return cluster_labels.reshape(segments_slic.shape) if __name__ == "__main__": from PIL import Image img_path = 'example.jpg' im = np.array(Image.open(img_path)) segmented_image = energy_based_segmentation(im) ```
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值