Structured Segment Rescaling with Gaussian Processes for Parameter Efficient Convnets

Date

2024

Authors

Siddiqui, Bilal

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We study methods to transform exiting Neural Networks (NNs) into more parameter efficient variants using a novel mechanism for structured pruning. The pruning method Structured Segment Rescaling (SSR) functions as downsampler on model dimensions and utilizes rescaling modifiers. These modifiers act on a segment, which is a logical group of identically dimensioned blocks. We study the behavior of SSR in Convolutional Neural Networks (ConvNets) and its generalization from heuristics into a well-defined framework.

Our novel structured pruning starts at model instantiation where we begin with heuristics that explore radical segment rescales. The rescales construct ConvNets with varied segments that can have new dimensions where some or most blocks and channels may be pruned away. In contrast to iterative unstructured pruning, SSR is significantly more aggressive, requires a single train cycle, and purposefully targets parameter banks to completely avoid tensor sparsity. Since sparse ten- sors require similar compute to dense tensors, our circumvention of sparsity uniquely places SSR amongst the prior work. SSR significantly reduces computation as measured in General Matrix Multiplications (GeMMs); we show that optimized SSR modifiers can achieve up to a 5X lower GeMM compute load.

The modifiers that rescale model segments in SSR are also augmented with an optimization step using a low cost Gaussian Process (GP). The GP serves to approximate the optimal modifiers using an initial set of depth and width modifiers that enumerate only extreme rescales. ConvNets constructed from these initial modifiers are named the sentinels. The sentinels coarsely explore the modifier space and provide the data bedrock for training GPs. We utilize the CIFAR datasets and ResNets to validate our findings. SSR only requires 101 GPU hours to yield efficient new ConvNets that can facilitate edge inference. Over 105 ConvNets may be derived from any typical ConvNet and these ConvNets need only be trained if their GP predicted accuracies show that they are viable candidates for power limited devices.

Our sentinel models drop parameter count by over 65% and improve latency by 3X. We then further optimize for better modifiers with GP modeling, and show that up to 80% structured parameter reduction is possible. We observe that both depth and width modifiers can significantly reduce parameters. We also note that only depth modification decreases latency because fewer blocks means less serial computation. Lastly, applying depth and width modifiers simultaneously to segments significantly increases ConvNet compression. We demonstrate that <1% accuracy degradation and >90% parameter reduction is possible when modifiers are jointly optimized.

Description

Keywords

Compression, Efficiency, Parameter Pruning

Citation

Department

Computer Science