Robust Representation Learning for Person Re-Identification

dc.contributor.advisorRuan, Jianhua
dc.contributor.advisorTian, Qi
dc.contributor.authorZhang, Hengheng
dc.contributor.committeeMemberRobbins, Kay
dc.contributor.committeeMemberKorkmaz, Turgay
dc.contributor.committeeMemberZhang, Weining
dc.contributor.committeeMemberHuang, Yufei
dc.date.accessioned2024-03-08T17:40:54Z
dc.date.available2024-03-08T17:40:54Z
dc.date.issued2020
dc.descriptionThis item is available only to currently enrolled UTSA students, faculty or staff. To download, navigate to Log In in the top right-hand corner of this screen, then select Log in with my UTSA ID.
dc.description.abstractWithin the scope of visual content retrieval, Person Re-Identification (ReID) targets to recognize a probe person appeared under multiple cameras as "identified before" regardless of its current position in an image or the amount of time passed since last known position. Due to its important applications in public security, e.g., cross camera pedestrian searching, tracking, and event detection, person ReID has attracted lots of attention from both the academic and industrial communities. Greatly benefited from the power of deep neural networks, the research of ReID has witnessed remarkable improvements in the past couple of years. However, its applications in real-world scenarios are still limited and remain to be a challenging problem. This dissertation aims at investigating the crucial factors that hinder the generalization ability of the current research of ReID and attempting to narrow down the gaps between research settings and open-world applications from innovative perspectives. Specifically, several frameworks and methods have been proposed to solve the corresponding issues as follows. A practical ReID system aims to find occurrences of a query person ID in a video sequence where human detection and recognition across a camera network should provide critical insights. Yet, these two problems have generally been studied in isolation within computer vision. The state-of-the-art research datasets and methods of ReID start from predefined bounding boxes, either hand-drawn or automatically detected. On the other hand, several pedestrian detectors achieve remarkable performance on benchmark datasets, but little analysis is available on how they can be used for person ReID. In order to fill up the void, we propose a large-scale benchmark dataset and baselines for practical person ReID in the wild, which moves beyond the sequential application of detection and recognition. In particular, we study three aspects of the problem that have not been considered in prior works. First, we analyze the effect of the combination of various detection and recognition methods on person ReID accuracy. Second, we study whether detection can help improve ReID accuracy and outline methods to do so. Third, we study choices for detectors that allow for maximal gains in ReID accuracy. Such a benchmark dataset also facilitates the design of an end-to-end ReID system which combines the detection and recognition process together. Another critical impeding issue is that ReID can be regarded as a zero-shot learning problem, where "zero-shot" means the query person identity is unseen and never covered by the training set. However, the disparity among different cameras and datasets makes the current algorithms hardly to generalize well from one domain to another, especially when the target domain is unknown. To solve this issue, we develop a 3D-guided adversarial transform (3D-GAT) network which explores the transferability of source training data to facilitate learning domain-independent knowledge. Being aware of a 3D model and human poses, 3D-GAT makes use of image-to-image translation to synthesize person images in different conditions whilst preserving features for identification as much as possible. With these augmented training data, it is easier for ReID approaches to perceive how a person can appear differently under varying viewpoints and poses, most of which are not seen in the training data, and thus achieve higher ReID accuracy especially in an unknown domain. Furthermore, a robust retrieval framework for ReID should never depend on only one individual type of similarity, which either cannot fully reveal the intrinsic relationship between images or is vulnerable to noises and even malicious attacks. As a result, great efforts are devoted to similarity (or metric) fusion, meanwhile diffusion process has shown its ability to substantially improve the performance in retrieval tasks. It stimulates a great research interest in considering similarity fusion in the framework of diffusion process (i.e., fusion with diffusion) for robust retrieval. Thereupon we firstly revisit representative methods about fusion with diffusion and provide new insights that are ignored by previous researchers. Then, observing that existing algorithms are susceptible to noisy similarities, a weight learning scheme is designed so that the negative impacts of noisy similarities are suppressed. Particularly, we proposed a multi-similarity diffusion framework on tensor graph, where the diffusion processes, the weight learning scheme, and the similarity fusion are tied up with a neat bow. At last, we integrate several recently proposed similarities into our proposed framework to demonstrate the robustness. Last but not least, retrieval efficiency is of the same importance as accuracy not only for ReID but also for all visual content retrieval tasks. Learning compact representation is always helpful to handle large-scale data, and among which, hashing methods mapping data from high dimension to compact binary codes can effectively cope with this issue. However, the quantization process of hashing results in unavoidable information loss, and so far, the performance in terms of accuracy is still not comparable with regular methods. The latest works usually focus on learning deep features and hashing functions simultaneously to preserve the similarity between images, while the similarity metric is fixed. In this work, we propose a Rank-embedded Hashing (ReHash) algorithm where the ranking lists of binary codes are optimized together with the feedback of the supervised deep feature learning. Specifically, ReHash jointly conducts the metric learning and the hashing codes optimizing in an end-to-end model. Thus, the similarity between images are enhanced by the ranking process. Meanwhile, the ranking results are additional supervision for the hashing function learning as well.
dc.description.departmentComputer Science
dc.format.extent149 pages
dc.format.mimetypeapplication/pdf
dc.identifier.isbn9798557049863
dc.identifier.urihttps://hdl.handle.net/20.500.12588/6202
dc.languageen
dc.subjectImage retrieval
dc.subjectPerson re-identification
dc.subjectDeep neural networks
dc.subject.classificationComputer science
dc.subject.classificationComputer engineering
dc.subject.classificationArtificial intelligence
dc.titleRobust Representation Learning for Person Re-Identification
dc.typeThesis
dc.type.dcmiText
dcterms.accessRightspq_closed
thesis.degree.departmentComputer Science
thesis.degree.grantorUniversity of Texas at San Antonio
thesis.degree.levelDoctoral
thesis.degree.nameDoctor of Philosophy

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
Zhang_utsa_1283D_13258.pdf
Size:
14.59 MB
Format:
Adobe Portable Document Format