RELEVANCE FEEDBACK AND IMAGE FEATURE EXTRACTION

Researchers: Sharadh Ramaswamy, Dr. Jelena Tesic, Dr. Sitaram Bhagavathy, Prof.  B. S. Manjunath, Prof. Kenneth Rose

Description: It is in general impractical to store large amounts of extracted feature vectors in a random access memory (RAM). Since input/output (I/O) operations for hard storage devices are slow, the time spent accessing the feature vectors overwhelmingly dominates the time complexity of the search.  However, what is often ignored is the observation that pre-defined feature vectors and standard distance measures are poor approximators of human perception, which could widely vary from one user to another. The first thrust of this work is on the extraction of better features to model images.

     In developing image/video search techniques, ``relevance feedback'' has been widely used in associating user's concept with the descriptors. Given a set of representative features, relevance feedback aims to learn the distance function with the user feedback. An initial set of nearest neighbors (based on some naive distance function) are retrieved, and these results are classified by the user as being relevant or not, and this is used to estimate the "correct" similarity metric, based upon which a new set of nearest neighbors are retrieved. During this learning process, the algorithm needs to access the descriptors associated with image/video objects. The time complexity problem is further exacerbated since the search is to be performed multiple times until user is satisfied with the retrieved results. So far, the impact of using learning on the indexing structure has generally been ignored. The feasibility of learning algorithms has not been evaluated for handling large amount of image/video data in relevance feedback scenarios. We believe that this is one of the first attempts to address the issue of efficient nearest neighbor search in relevance feedback in the literature.

Results:  Our results can be classified into two categories. The first result deals with feature extraction and distance function formulation for improved perceptual quality in image retrieval. The other set of results focus on optimizing indexing for relevance feedback in image databases.

Distance Formulation and Feature Extraction

     We have been able to successfully model a broad variety of compound geo-spatial objects such as harbors, golf courses, and airports, using Gabor-filter based texture features (see publication no. 12). On the other hand for image databases (see publication no. 8), we considered inter-image distances that involve geometric mappings subject to global coherence constraints (implemented by imposing hidden Markov models). Such distances enable robust clustering and indexing techniques, which were then tested on various face image databases and was seen to outperform the Bayesian intra/extrapersonal classifier.

Indexing for Relevance Feedback

     Nearest neighbor computation in each iteration is quadratic with number of dimensions and linear with number of items. In one proposed scheme (see publication no. 4),  the use of the user's feedback enbles not only to improve the effectiveness of the similarity retrieval, but also its efficiency in an interactive content based image retrieval system. It uses rectangular approximations for nearest neighbor search under quadratic distance metric and exploits correlations between two consecutive nearest neighbor sets. This adaptive approach significantly reduces the overall search complexity by reducing the number of assessed data on the hard storage devices up to 90 times.

    As an alternative to the VA-File style indexing for relevance feedback, we consider VQ schemes. VQ is optimal in compression and hence can produce more compact representations of the data-set. We derive an invariance property of point-to-hyperplane distance for a general Mahalanobis distance. This property allows us to apply the hyperplane bound (see publication  no. 16) for cluster distance bounding and selection, under a changing Mahalnobis distance matrix. Enormous gains in I/O, storage and computational complexity were noticed (see publication no. 22).

 

Fig. 1: IO Performance Comparison for MARS

         We describe results of experiments on one such data-set. We select images from the CalTech 101 data-set, which have strong texture features. The 11 classes of images chosen were - ``accordion", ``barrel", ``bass", ``brain", ``beaver", ``cougar body", ``Leopards", ``wild cat", ``hedgehog", ``platypus", and ``soccer ball". These images classes are a-priori unknown to the system and the images are used as queries for performance evaluation. We embed these images within the CORTINA-48 (texture) data-set.

     In one set of experiments, the weight matrix for image retrieval was evaluated according to relevance feedback principles established in MARS, here each class has a separate weight matrix. In another, a random weight matrix was used for retrieval. For each query, the 10 nearest neighbors (10NN) were mined. We also assumed a page size of 8kB. In both cases, the data-set was clustered with the squared Euclidean distance i.e. without knowledge of the Mahalanobis weight matrix for retrieval. The hyperplane based cluster retrieval has huge gains over the VA-File variant used for relevance feedback indexing (see Figures 1 and 2).

Fig. 2: IO Performance Comparison for Random Weight Matrix