Data-driven transforms for exploration, visualization and classification of high-dimensional data
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advances in data recording techniques allow collecting of massive amounts of data, often accompanied by external metadata. To gain a full understanding of these datasets, the metadata needs to be incorporated into the analysis. This dissertation focuses on data-driven transforms: effect analysis, creating transforms that incorporate metadata directly into dataset topology, and study of data-driven transform applications.
We study the transform effects using a set of new visual methods for analysis of dataset structures. The methods analyze feature distribution, topological structure, and estimates of whether the structure carries significant class information. We apply these to explore the structure of the dataset and to explore the effects of data-driven transforms.
We also propose data-driven transforms that incorporate metadata directly into the dataset topology. One such transform, the force feature space (FFS) transform, modifies the dataset topology based on class metadata to emphasize similarities between points in the same class and enhance class separability. FFS can be tailored to any dataset by changing the force definitions or adjusting the parameters. FFS transforms combined with a low-dimensional projection increase the quality of visualizations. When used for classification, FFS offers alternative approaches that increase correctness and reliability. Analysis of attractive and repulsive forces can be used to increase quality of feature detection.
Data-driven transforms provide alternative views of the dataset, revealing properties hidden in the original space. Understanding the effects and potential of data-driven transforms allows for better exploration of the transform space and increases the quality of analysis.