Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs

Jun Hyung Park, Kang Min Kim, Sangkeun Lee

Research output: Contribution to journalArticlepeer-review

10 Scopus citations

Abstract

Deep neural networks typically have extensive parameters and computational operations. Pruning and quantization techniques have been widely used to reduce the complexity of deep models. Both techniques can be jointly used for realizing significantly higher compression ratios. However, separate optimization processes and difficulties in choosing the hyperparameters limit the application of both the techniques simultaneously. In this study, we propose a novel compression framework, termed as quantized sparse training, that prunes and quantizes networks jointly in a unified training process. We integrate pruning and quantization into a gradient-based optimization process based on the straight-through estimator. Quantized sparse training enables us to simultaneously train, prune, and quantize a network from scratch. The empirical results validate the superiority of the proposed methodology over the recent state-of-the-art baselines with respect to both the model size and accuracy. Specifically, quantized sparse training achieves a 135 KB model size in the case of VGG16, without any accuracy degradation, which is 40% of the model size feasible based on the state-of-the-art pruning and quantization approach.

Original languageEnglish
Article number3524066
JournalTransactions on Embedded Computing Systems
Volume21
Issue number5
DOIs
StatePublished - 8 Oct 2022

Bibliographical note

Publisher Copyright:
© 2022 Copyright held by the owner/author(s). Publication rights licensed to ACM.

Keywords

  • Deep learning
  • joint pruning and quantization
  • model compression
  • neural network

Fingerprint

Dive into the research topics of 'Quantized Sparse Training: A Unified Trainable Framework for Joint Pruning and Quantization in DNNs'. Together they form a unique fingerprint.

Cite this