Scene Recognition

本课程作业要求团队合作,使用k-最近邻、视觉词袋和高级特征(如GIST和DenseSIFT)实现15类场景的图像分类。参与者需提交训练代码、报告和预测文件,评估指标是平均精度。项目强调代码结构、报告质量和创新性方法的应用。


Coursework 2 (Group) – Scene Recognition
Brief
This is a group coursework: please work in teams of four people.
Due date: Wednesday 10th January, 16:00.
Development data download: training.zip in the coursework (CW) folder
Testing data download: testing.zip in the CW folder
Required files: report.pdf; code.zip; run1.txt; run2.txt; run3.txt
Credit: 25% of overall module mark
Overview
The goal of this project is to introduce you to image recognition. Specifically, we will examine the
task of scene recognition starting with very simple methods -- tiny images and nearest neighbour
classification -- and then move on to techniques that resemble the state-of-the-art.
This coursework will run following the methodology used in many current scientific benchmarking
competitions/evaluations. You will be provided with a set of labelled development images from
which you are allowed to develop and tune your classifiers. You will also be provided with a set of
unlabelled images for which you will be asked to produce predictions of the correct class.
Details
You will need to write software that classifies scenes into one of 15 categories. We want you to
implement three different classifiers as described below. You will then need to run each classifier
against all the test images and provide a prediction of the class for each image.
Data
The training data consists of 100 images for each of the 15 scene classes. These are arranged in
directories named according to the class name. The test data consists of 2985 images. All the
images are provided in JPEG format. All the images are grey-scale, so you don't need to consider
colour.
Objective measure
The key classification performance indicator for this task is average precision; this is literally the
proportion of number of correct classifications to the total number of predictions (i.e. 2985).
Run conditions
As mentioned above, you need to develop and run three different classifiers. We'll refer to the
application of a classifier to the test data as a "run".
Run #1: You should develop a simple k-nearest-neighbour classifier using the "tiny image" feature.
The "tiny image" feature is one of the simplest possible image representations. One simply crops
each image to a square about the centre, and then resizes it to a small, fixed resolution (we
recommend 16x16). The pixel values can be packed into a vector by concatenating each image
row. It tends to work slightly better if the tiny image is made to have zero mean and unit length.
You can choose the optimal k-value for the classifier.
Run #2: You should develop a set of linear classifiers (an ensemble of 15 one-vs-all classifiers)
using a bag-of-visual-words feature based on fixed size densely-sampled pixel patches. We
recommend that you start with 8x8 patches, sampled every 4 pixels in the x and y directions. A
sample of these should be clustered using K-Means to learn a vocabulary (try ~500 clusters to
start). You might want to consider mean-centring and normalising each patch before
clustering/quantisation. Note: we're not asking you to use SIFT features here - just take the pixels
from the patches and flatten them into a vector & then use vector quantisation to map each patch
to a visual word.
Run #3: You should try to develop the best classifiers you can! You can choose whatever feature,
encoding and classifier you like. Potential features: the GIST feature; Dense SIFT; Dense SIFT in a
Gaussian Pyramid; Dense SIFT with spatial pooling (commonly known as PHOW - Pyramid
Histogram of Words), etc. Potential classifiers: Naive bayes; non-linear SVM (perhaps using a linear
classifier with a Homogeneous Kernel Map), ...
Run prediction format
The predictions for each run must be written to a text file named runX.txt (where X is the run
number) with the following format:
For example:
<image_name> <predicted_class>
<image_name> <predicted_class>
<image_name> <predicted_class>
...
0.jpg tallbuilding
1.jpg forest
2.jpg mountain
3.jpg store
4.jpg store
5.jpg bedroom
...
Restrictions
• You are not allowed to use the testing images for anything other than producing the final
predictions They must not be used for either training or learning feature encoding.
The report
The report must be no longer than 4 sides of A4 with the given Latex format for CW2, and must be
submitted electronically as a PDF. The report must include:
• The names and ECS user IDs of the team members
• A description of the implementation of the classifiers for the three runs, including information on
how they were trained and tuned, and the specific parameters used for configuring the feature
extractors and classifiers. We expect that your "run 3" section will be considerably longer than the
descriptions of runs 1 & 2.
• A short statement detailing the individual contributions of the team members to the coursework.
What to hand in
You need to submit to ECS Handin the following items:
• The group report (as a PDF document in the CVPR format same as CW2; max 4 A4 sides, no
appendix)
• Your code enclosed in a zip file (including everything required to build/run your software and to
train and use your classifiers; please don't include binaries or any of the images!)
• The run prediction files for your three runs (named "run1.txt", "run2.txt" and "run3.txt").
• A plain text file listing the user ids (e.g. xx1g20) of the members of your team; one per line.
Marking and feedback
Marks will be awarded for:
• Successful completion of the task.
• Well structured and commented code.
• Evidence of professionalism in implementation and reporting.
• Quality and contents of the report.
• The quality/soundness/complexity of approach used for run 3.
Marks will not be based on the actual performance of your approach (although you can expect to
lose marks if runs 1 and 2 are way off our expectations or you fail to follow the submission
instructions). We will open the performance rankings for run 3. !"#$
Standard ECS late submission penalties apply.
Individual feedback will be given to each team covering the above points. We will also give
wechat codinghelp
 

Scene recognition是一种计算机视觉领域的技术,用于识别图像或视频中的场景或环境。OpenCV是一个广泛使用的开源计算机视觉库,用于开发视觉算法和应用程序。场景识别在计算机视觉中是非常重要的,因为它有许多实际应用。 OpenCV提供了许多用于场景识别的功能和算法。其中一些算法包括: 1. 特征提取和描述:OpenCV提供了一些经典的特征提取和描述算法,例如SIFT(尺度不变特征转换)、SURF(加速稳健特征)、ORB(Oriented FAST and Rotated BRIEF)等。这些算法可以用于从图像中提取关键点和特征描述符,以便进行比较和匹配。 2. Bag of Visual Words(BoVW):这是一种常用于场景识别的算法。它将图像分为若干个局部区域,然后在每个区域中提取特征,并通过聚类将这些特征归于不同的视觉词汇中。最后,利用这些视觉词汇构成图像的特征向量来表示图像,从而进行识别。 3. 支持向量机(SVM):SVM是一种常用的机器学习算法,它可以用于分类和回归问题。在场景识别中,SVM可以用于对图像进行分类。对于每个场景,可以训练一个二分类器,将其作为一类图像的识别器,然后将其应用于待识别图像。 通过使用OpenCV中的这些功能和算法,可以实现准确和高效的场景识别,使其在许多实际应用中变得非常有用。比如说,可以用它来自动对图片分类、做图像搜索引擎,也可以在安防领域用来做异常检测等等。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值