Robust Image Representation for Classification and Retrieval
How to represent visual information is crucial for computers to analyze and understand massive images. The objective of my research is to construct a robust image representation for efficient and effective visual object recognition and retrieval.
Bag-of-Words model (BoW) has shown its superiority over many conventional global features in image classification and retrieval systems . However, the large quantization error may degrade the effectiveness of the BoW representation . To address this problem, several soft quantization based methods have been proposed in literature . Nevertheless, these methods are not efficient enough when applied on large dataset, and their effectiveness is still unsatisfied. We propose a new model of image representation based on a multi-layer codebook. In this method, we first construct a multi-layer codebook by explicitly reducing the quantization error in a global or local manner. Then we use parallel or hierarchically connected visual codebooks to quantize each local feature into multiple visual words. It yields a more precise representation to describe the distribution of local features in the visual space.
The above representation disregards the spatial configuration of visual features. Spatial Pyramid Matching (SPM)  has been proposed to extend the BoW model for object classification. By encoding global image positions of local features, it makes image matching more accurate. However, for unaligned images, where the object is rotated, flipped or translated, SPM loses its discriminative power. We propose some new spatial pooling strategies to deal with various transformation variations. The spatial configurations of visual features in both local area and global area are taken into our consideration to generate a more robust image representation.
Furthermore, given one image representation, it is hard to retrieve all the similar images from a large-scale image database. To enrich the given image example, query expansion method has been used in image retrieval systems . These methods take either whole images or matching regions as new queries, which can unavoidably add irrelevant features into the query. To minimize the irrelevant visual features introduced by query expansion, we proposed a new query expansion approach. The image representation produced by our method can retrieve more images that are relevant, and it remains consistent with the original representation in the meanwhile.
We conduct extensive experiments on some well-known benchmark datasets for image classification and retrieval task. Experimental results demonstrate that our methods can further improve the performance compared with the state-of-the-arts. Besides, the proposed image representation is compact and consistent with the BoW model, which makes it applicable to image retrieval and other related task in computer vision as well.