Analyzing the Geometric Structure of Deep Learning Decision Boundaries
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Training deep learning models is an incredibly effective method for finding function approximators. However, understanding the behavior of these trained models from a first-principles description is an open problem. This is because model training is a complex dynamical system parameterized by a training step, initial condition, dataset, and learning algorithm which is computationally intractable to study. Adding to this, trained models often exhibit counterintuitive properties, such as the existence of adversarial examples. Current deep learning theory lacks the tools necessary to answer many questions posed by empirical results, such as why adversarial attacks transfer between models and why improving robustness decreases performance. This dissertation provides the tools to formally answer some of these questions. First, this work defines a differentiable algorithm on a model's inputs, weights, and training set, which exactly replicates model behavior across training. This allows exact measurement of training contribution and allows measurement of the trained model's signal manifold. Second, this work provides a loss function which aligns a model's gradients to a fixed low dimensional manifold. These tools are then applied to computer vision datasets to formally study the properties of adversarial robustness, explainability and out-of-distribution detection. The goal of this dissertation is to provide techniques which will advance the theoretical understanding of deep learning.