Curvature histogram features for retrieval of images of smooth 3D objects

We consider image features on the base of histograms of oriented gradients (HOG) with addition of contour curvature histogram (HOG-CH), and also compare it with results of known scale-invariant feature transform (SIFT) approach in application to retrieval of images of smooth 3D objects.


Introduction
The difficulty of image retrieval by text-based indexing significantly increases when one deals with images of smooth 3D objects. A typical example of such object is a modern abstract sculpture. A potential user, who would like to know the name and the author of sculpture, most probably will be confused to describe the shape of the object observed in the image. Thus, the search by text description is hardly applicable here and image retrieval should be done in a fully automatic mode without prior information about observed objects. In this case we deal with content-based image retrieval alone, i.e. we need to build an image descriptor automatically without any user interaction and provide a matching algorithm in order to find a corresponding object in the image base.
Direction of intensity gradient vector is a strong discriminative feature for images of smooth 3D objects, and one can use it for image retrieval. Our research was mainly inspired by work of Arandjelovic and Zisserman [1], where they presented a bag of boundaries (BoB) approach, based on histograms of oriented gradients (HOG) [2], the gPb contour detector [3] and prior segmentation. Two last steps are highly time-consuming, so here we tested less sophisticated technique with a simpler contour detection algorithm on the base of classical Deriche filter [4] and without prior segmentation, which can provide incorrect results for complicated scenes. The main motivation was in further implementation of retrieval system on a mobile platform. Besides this, we investigate a variation of HOG features which combines histograms of oriented gradients and curvature histogram (HOG-CH). Finally, descriptors on the base of scale-invariant feature transform (SIFT) approach [5] was also examined for comparison.

Image features evaluation
Descriptor of image consists of the set of feature vectors with N elements which contain information about low-level features, for instance, shape, color, texture, etc. It is a convenient way of image representation, storage and matching. Thus, image retrieval task can be formalized as searching for the closest feature vector with N elements with respect to a specified metric, for example, Euclidean distance. The more matched features we have, the more images are similar to each other. Here we evaluate features on the base of histogram of gradient orientations. The first step is Deriche's filtration of grayscale image with further edge detection and tracking. As a result we have a bunch of edges with known coordinates of each pixel belonging to the edge along with gradient magnitude and orientation at that coordinates. After selecting edges with appropriate length we sample key points at regular interval and compute feature vector for each of them. Following [1], feature vectors are computed at multiple scales in order to represent boundary information locally. The area of region of interest around selected point on the edge is set to be 1/30, 2/15 and 8/15 of the image area. Every patch is scaled to 32×32 elements, thus for computation of HOG we use 4×4 cell array, with each cell containing 8×8 elements. The HOG feature vector has 324 elements which are computed via spatial binning into 9 gradient orientations for 9 overlapping blocks each with 2×2 cells within scaled patch. The difference of our approach from HOG is that we compute histograms only for points on the edge, thus features becomes more stable to a possible background variation.
The second part of the HOG-CH feature vector includes information about curvature of the edge at selected point. Curvature parameter can be computed using formula (1): where |L| -length of the line L between neighboring pixels from left and right side of edge with respect to the current pixel, Sl and Sr are areas of geometric shapes that are formed by edge and line L (see Figure 1). We can consider a line L as a rectangle with side equal to 1 and |L|. For a curved edge the sum of areas Sl and Sr will be bigger than the area of that rectangle, i.e. length of line L, so parameter C will be less than 1. If C is equal to 1, we will deal with a straight contour line. Final HOG-CH feature vector is formed as a concatenation of HOG part and histogram of C values.

Experimental results
For experimental evaluation we used Oxford's "Sculptures 6k" dataset with more than 6000 images available at [6]. HOG-CH, HOG and SIFT features were computed for every image in the dataset and formed a bank of image features. In the retrieval process we extracted features in the query image and matched them with features in the bank using the nearest neighbor scheme with additional constraints on the base of epipolar geometry. If the number of matched features was higher than defined threshold, we considered images as similar. Thus, image retrieval procedure consisted in searching for the correspondent image with maximal number of correctly matched features. Figure 2 shows an example of retrieval results for HOG and HOG-CH features, and

Conclusion
We have considered HOG-CH features in application to retrieval of smooth 3D object images. The features represent as a simplified modification of HOG with additional curvature histogram. Evaluated experiments showed that proposed features provide lower percentage of correctly retrieved images than HOG features, but exceed SIFT features significantly. HOG features give good results for images with textured background, because of consideration of gradient orientations within areas around pixels on the contour, while HOG-CH features are concentrated on the contour itself and show better result for images with homogeneous background. Thus, the presented approach is suitable for content-based retrieval of images of smooth 3D objects, but requires further enhancement in order to increase quality of image retrieval system.