DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Jiajun Zhou; Jiajun Wu; Yizhao Gao; Yuhao Ding; Chaofan Tao; Boyu Li; Fengbin Tu; Kwang Ting Cheng; Hayden Kwok Hay So; Ngai Wong

Journal Article

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2024) 43(5) 1613-1617

DOI: 10.1109/TCAD.2023.3342730

0Citations

4Readers

Get full text

Abstract

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-fields to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to tradeoff the inference accuracy and speedup. Experimental results demonstrate that the ImageNet inference accuracy via DyBit is 1.97% higher than the state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1× speedup compared with the original ResNet-50 model.

Author supplied keywords

Cite

CITATION STYLE

APA

Zhou, J., Wu, J., Gao, Y., Ding, Y., Tao, C., Li, B., … Wong, N. (2024). DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 43(5), 1613–1617. https://doi.org/10.1109/TCAD.2023.3342730

DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Abstract

Author supplied keywords

Cite

Register to see more suggestions