Abstract
Deep neural networks typically have extensive parameters and computational operations. Pruning and quantization techniques have been widely used to reduce the complexity of deep models. Both techniques can be jointly used for realizing significantly higher compression ratios. However, separate optimization processes and difficulties in choosing the hyperparameters limit the application of both the techniques simultaneously. In this study, we propose a novel compression framework, termed as quantized sparse training, that prunes and quantizes networks jointly in a unified training process. We integrate pruning and quantization into a gradient-based optimization process based on the straight-through estimator. Quantized sparse training enables us to simultaneously train, prune, and quantize a network from scratch. The empirical results validate the superiority of the proposed methodology over the recent state-of-the-art baselines with respect to both the model size and accuracy. Specifically, quantized sparse training achieves a 135 KB model size in the case of VGG16, without any accuracy degradation, which is 40% of the model size feasible based on the state-of-the-art pruning and quantization approach.
Original language | English |
---|---|
Article number | 3524066 |
Journal | Transactions on Embedded Computing Systems |
Volume | 21 |
Issue number | 5 |
DOIs | |
State | Published - 8 Oct 2022 |
Bibliographical note
Publisher Copyright:© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.
Keywords
- Deep learning
- joint pruning and quantization
- model compression
- neural network