2.Outline
What is SIFT
Algorithm overview
Object Detection
Summary
3.Overview
1999
Generates image features, “keypoints”
invariant to image scaling and rotation
partially invariant to change in illumination and 3D camera viewpoint
many can be extracted from typical images
highly distinctive
4.Algorithm overview
Scale-space extrema detection
Uses difference-of-Gaussian function
Keypoint localization
Sub-pixel location and scale fit to a model
Orientation assignment
1 or more for each keypoint
Keypoint descriptor
Created from local image gradients
5.Scale space
Definition:
where
6.Scale space
Keypoints are detected using scale-space extrema in difference-of-Gaussian function D
D definition:
Efficient to compute
7.Relationship of D to
Close approximation to scale-normalized Laplacian of Gaussian,
Diffusion equation:
Approximate ∂G/∂σ:
giving,
When D has scales differing by a constant factor it already incorporates the σ2 scale normalization required for scale-invariance
8.Scale space construction
2k2σ
2kσ
2σ
kσ
σ
2kσ
2σ
kσ
σ
9.Scale space images
…
first octave
…
…
second octave
…
…
third octave
…
fourth octave
…
…
10.Difference-of-Gaussian images
…
first octave
…
…
second octave
…
…
third octave
…
fourth octave
…
…
11.Frequency of sampling
There is no minimum
Best frequency determined experimentally
12.Prior smoothing for each octave
Increasing σ increases robustness, but costs
σ = 1.6 a good tradeoff
Doubling the image initially increases number of keypoints
13.Finding extrema
Sample point is selected only if it is a minimum or a maximum of these points
DoG scale space
Extrema in this image
14.Localization
3D quadratic function is fit to the local sample points
Start with Taylor expansion with sample point as the origin
where
Take the derivative with respect to X, and set it to 0, giving
is the location of the keypoint
This is a 3x3 linear system
15.Localization
Derivatives approximated by finite differences,
example:
If X is > 0.5 in any dimension, process repeated
16.Filtering
Contrast (use prev. equation):
If | D(X) | < 0.03, throw it out
Edge-iness:
Use ratio of principal curvatures to throw out poorly defined peaks
Curvatures come from Hessian:
Ratio of Trace(H)2 and Determinant(H)
If ratio > (r+1)2/(r), throw it out (SIFT uses r=10)
17.Orientation assignment
Descriptor computed relative to keypoint’s orientation achieves rotation invariance
Precomputed along with mag. for all levels (useful in descriptor computation)
Multiple orientations assigned to keypoints from an orientation histogram
Significantly improve stability of matching
18.Keypoint images
19.Descriptor
Descriptor has 3 dimensions (x,y,θ)
Orientation histogram of gradient magnitudes
Position and orientation of each gradient sample rotated relative to keypoint orientation
20.Descriptor
Weight magnitude of each sample point by Gaussian weighting function
Distribute each sample to adjacent bins by trilinear interpolation (avoids boundary effects)
21.Descriptor
Best results achieved with 4x4x8 = 128 descriptor size
Normalize to unit length
Reduces effect of illumination change
Cap each element to 0.2, normalize again
Reduces non-linear illumination changes
0.2 determined experimentally
22.Object Detection
Create a database of keypoints from training images
Match keypoints to a database
Nearest neighbor search
23.PCA-SIFT
Different descriptor (same keypoints)
Apply PCA to the gradient patch
Descriptor size is 20 (instead of 128)
More robust, faster
24.Summary
Scale space
Difference-of-Gaussian
Localization
Filtering
Orientation assignment
Descriptor, 128 elements