DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

0Citations
Citations of this article
4Readers
Mendeley users who have this article in their library.
Get full text

Abstract

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-fields to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to tradeoff the inference accuracy and speedup. Experimental results demonstrate that the ImageNet inference accuracy via DyBit is 1.97% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1× speedup compared with the original ResNet-50 model.

Cite

CITATION STYLE

APA

Zhou, J., Wu, J., Gao, Y., Ding, Y., Tao, C., Li, B., … Wong, N. (2024). DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(5), 1613–1617. https://doi.org/10.1109/TCAD.2023.3342730

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free