Introduction to SIFT (Scale-Invariant Feature Transform)

本文详细介绍SIFT算法原理及其在OpenCV中的实现。SIFT算法由D. Lowe于2004年提出,能够从图像中提取尺度不变的关键点并计算其描述符。文章概述了算法的四个主要步骤:尺度空间极值检测、关键点定位、方向赋值及关键点描述子创建,并解释了关键点匹配的过程。
部署运行你感兴趣的模型镜像

Notes

This main page comes from the OpenCV-Python Tutorials.
I converted the code from Python to C++ and commented the uesage relating functions.

Theory

In last couple of chapters, we saw some corner detectors like Harris etc. They are rotation-invariant, which means, even if the image is rotated, we can find the same corners. It is obvious because corners remain corners in rotated image also. But what about scaling? A corner may not be a corner if the image is scaled. For example, check a simple image below. A corner in a small image within a small window is flat when it is zoomed in the same window. So Harris corner is not scale invariant.


sift_scale_invariant.jpg

So, in 2004, D.Lowe, University of British Columbia, came up with a new algorithm, Scale Invariant Feature Transform (SIFT) in his paper, Distinctive Image Features from Scale-Invariant Keypoints, which extract keypoints and compute its descriptors. *(This paper is easy to understand and considered to be best material available on SIFT. So this explanation is just a short summary of this paper)*.

There are mainly four steps involved in SIFT algorithm. We will see them one-by-one.

1. Scale-space Extrema Detection

From the image above, it is obvious that we can’t use the same window to detect keypoints with different scale. It is OK with small corner. But to detect larger corners we need larger windows. For this, scale-space filtering is used. In it, Laplacian of Gaussian is found for the image with various σ scale.

But this LoG is a little costly, so SIFT algorithm uses Difference of Gaussians which is an approximation of LoG. Difference of Gaussian is obtained as the difference of Gaussian blurring of an image with two different σ. This process is done for different octaves of the image in Gaussian Pyramid. It is represented in below image:


sift_dog.jpg

Once this DoG are found, images are searched for local extrema over scale and space. For eg, one pixel in an image is compared with its 8 neighbours as well as 9 pixels in next scale and 9 pixels in previous scales. If it is a local extrema, it is a potential keypoint. It basically means that keypoint is best represented in that scale. It is shown in below image:


sift_local_extrema.jpg

Regarding different parameters, the paper gives some empirical data which can be summarized as, number of octaves = 4, number of scale levels = 5, initial σ=1.6 etc as optimal values.

2. Keypoint Localization

Once potential keypoints locations are found, they have to be refined to get more accurate results. They used Taylor series expansion of scale space to get more accurate location of extrema, and if the intensity at this extrema is less than a threshold value (0.03 as per the paper), it is rejected. This threshold is called contrastThreshold in OpenCV

DoG has higher response for edges, so edges also need to be removed. For this, a concept similar to Harris corner detector is used. They used a 2x2 Hessian matrix (H) to compute the pricipal curvature. We know from Harris corner detector that for edges, one eigen value is larger than the other. So here they used a simple function,

If this ratio is greater than a threshold, called edgeThreshold in OpenCV, that keypoint is discarded. It is given as 10 in paper.

So it eliminates any low-contrast keypoints and edge keypoints and what remains is strong interest points.

3. Orientation Assignment

Now an orientation is assigned to each keypoint to achieve invariance to image rotation. A neigbourhood is taken around the keypoint location depending on the scale, and the gradient magnitude and direction is calculated in that region. An orientation histogram with 36 bins covering 360 degrees is created. (It is weighted by gradient magnitude and gaussian-weighted circular window with σ equal to 1.5 times the scale of keypoint. The highest peak in the histogram is taken and any peak above 80% of it is also considered to calculate the orientation. It creates keypoints with same location and scale, but different directions. It contribute to stability of matching.

4. Keypoint Descriptor

Now keypoint descriptor is created. A 16x16 neighbourhood around the keypoint is taken. It is devided into 16 sub-blocks of 4x4 size. For each sub-block, 8 bin orientation histogram is created. So a total of 128 bin values are available. It is represented as a vector to form keypoint descriptor. In addition to this, several measures are taken to achieve robustness against illumination changes, rotation etc.

5. Keypoint Matching

Keypoints between two images are matched by identifying their nearest neighbours. But in some cases, the second closest-match may be very near to the first. It may happen due to noise or some other reasons. In that case, ratio of closest-distance to second-closest distance is taken. If it is greater than 0.8, they are rejected. It eliminaters around 90% of false matches while discards only 5% correct matches, as per the paper.

So this is a summary of SIFT algorithm. For more details and understanding, reading the original paper is highly recommended. Remember one thing, this algorithm is patented. So this algorithm is included in the opencv contrib repo

SIFT in OpenCV

So now let’s see SIFT functionalities available in OpenCV. Let’s start with keypoint detection and draw them. First we have to construct a SIFT object. We can pass different parameters to it which are optional and they are well explained in docs.

#include <stdio.h>
#include <iostream>
#include "opencv2/core.hpp"
#include "opencv2/features2d.hpp"
#include "opencv2/xfeatures2d.hpp"
#include "opencv2/highgui.hpp"

using namespace std;
using namespace cv;
using namespace cv::xfeatures2d;

void readme();

/* @function main */
int main( int argc, char** argv )
{
  if( argc != 3 )
  { readme(); return -1; }
  Mat img_1 = imread( argv[1], IMREAD_GRAYSCALE );
  Mat img_2 = imread( argv[2], IMREAD_GRAYSCALE );
  if( !img_1.data || !img_2.data )
  { cout<< " --(!) Error reading images " << std::endl; return -1; }
  //-- Step 1: Detect the keypoints using SIFT Detector
  Ptr<SIFT> detector = SIFT::create();
  vector<KeyPoint> keypoints_1, keypoints_2;
  detector->detect( img_1, keypoints_1 );
  detector->detect( img_2, keypoints_2 );
  //-- Draw keypoints
  Mat img_keypoints_1; Mat img_keypoints_2;
  drawKeypoints( img_1, keypoints_1, img_keypoints_1, Scalar::all(-1), DrawMatchesFlags::DEFAULT );
  drawKeypoints( img_2, keypoints_2, img_keypoints_2, Scalar::all(-1), DrawMatchesFlags::DEFAULT );
  //-- Show detected (drawn) keypoints
  imshow("Keypoints 1", img_keypoints_1 );
  imshow("Keypoints 2", img_keypoints_2 );
  cout << "Keypoints 1 numnber: " << keypoints_1.size() << endl;
  cout << "Keypoints 2 numnber: " << keypoints_2.size() << endl;
  waitKey(0);
  return 0;
  }

  /* @function readme */
  void readme()
  { std::cout << " Usage: ./SIFT_detector <img1> <img2>" << std::endl; }

Build and run
./SIFT_detector box.png box_in_scene.png
We will get the keypoints marked of the above 2 images.


这里写图片描述
box.png


这里写图片描述
box_in_scene.png

We can also get the keypoints number of the 2 images.
Keypoints 1 numnber: 604
Keypoints 2 numnber: 969

We first create a SIFT detector, and then use the detector detecting the keypoints of two images. So we get the keypoints_1 and keypoints_2 then we draw them using drawKeypoints . In the end, we show the keypoints and print the number of them.

Additional Resources

Lowe: Distinctive image features from scale-invariant keypoints
OpenCV: SIFT Class Reference
OpenCV: Drawing Function of Keypoints and Matches

您可能感兴趣的与本文相关的镜像

Python3.9

Python3.9

Conda
Python

Python 是一种高级、解释型、通用的编程语言,以其简洁易读的语法而闻名,适用于广泛的应用,包括Web开发、数据分析、人工智能和自动化脚本

Scale-Invariant Feature TransformSIFT,尺度不变特征变换)是一种用于图像处理和计算机视觉中的关键点检测和描述的算法,由David Lowe在1999年提出,并在2004年进一步改进。其主要特点是对尺度和旋转的不变性,在图像匹配、目标识别和3D重建等领域非常流行[^3]。 SIFT算法的原理基于从尺度不变的关键点中提取关键点并计算其描述符。该算法主要包括以下步骤: 1. **尺度空间极值检测**:通过构建尺度空间,也就是在不同尺度下观察图像来检测关键点。尺度空间是通过高斯模糊和下采样原始图像构建的[^3]。 2. **关键点定位**:在尺度空间中,通过比较每个像素点与其邻域内的点(包括不同尺度和方向)来确定关键点的位置[^3]。 3. **方向赋值**:为每个关键点分配一个主方向,通常是通过计算关键点邻域内的梯度方向直方图来实现[^3]。 4. **关键点描述**:生成关键点的描述符,通常是一个向量,包含了关键点周围区域的梯度信息。这个描述符对图像的尺度、旋转和亮度变化具有鲁棒性[^3]。 5. **匹配**:使用关键点的描述符来匹配不同图像中的关键点,从而实现图像之间的对应关系[^3]。 SIFT算法在多个领域有广泛应用: - **图像匹配**:能够在不同尺度、旋转和光照条件下准确匹配图像中的关键点,因此可用于图像拼接、图像检索等任务。 - **目标识别**:可提取目标的特征,用于识别图像中的特定目标。 - **3D重建**:通过匹配不同视角下的图像关键点,可实现三维场景的重建。 以下是使用Python和OpenCV库实现SIFT算法的简单代码示例: ```python import cv2 # 读取图像 image = cv2.imread('your_image.jpg') gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) # 创建SIFT对象 sift = cv2.SIFT_create() # 检测关键点并计算描述符 keypoints, descriptors = sift.detectAndCompute(gray, None) # 在图像上绘制关键点 image_with_keypoints = cv2.drawKeypoints(image, keypoints, None, color=(0, 255, 0)) # 显示结果 cv2.imshow('Image with Keypoints', image_with_keypoints) cv2.waitKey(0) cv2.destroyAllWindows() ```
评论
成就一亿技术人!
拼手气红包6.0元
还能输入1000个字符
 
红包 添加红包
表情包 插入表情
 条评论被折叠 查看
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值