Daisy-chained systolic array and reconfigurable memory space for narrow memory bandwidth

7Citations
Citations of this article
8Readers
Mendeley users who have this article in their library.

Abstract

A paradigm shift toward edge computing infrastructures that prioritize small footprint and scalable/easy-to-estimate performance is increasing. In this paper, we propose the following to improve the footprint and the scalability of systolic arrays: (1) column multithreading for reducing the number of physical units and maintaining the performance even for back-to-back floating-point accumulations; (2) a cascaded peer-to-peer AXI bus for a scalable multichip structure and an intra-chip parallel local memory bus for low latency; (3) multilevel loop control in any unit for reducing the startup overhead and adaptive operation shifting for efficient reuse of local memories. We designed a systolic array with a single column × 64 row configuration with Verilog HDL, evaluated the frequency and the performance on an FPGA attached to a ZYNQ system as an AXI slave device, and evaluated the area with a TSMC 28nm library and memory generator and identified the following: (1) the execution speed of a matrix multiplication/a convolution operation/a light-field depth extraction, whose size larger than the capacity of the local memory, is 6.3× / 9.2× / 6.6× compared with a similar systolic array (EMAX); (2) the estimated speed with a 4-chip configuration is 19.6× / 16.0× / 8.5×; (3) the size of a single-chip is 8.4 mm2 (0.31× of EMAX) and the basic performance per area is 2.4×.

References Powered by Scopus

Deep residual learning for image recognition

173967Citations
N/AReaders
Get full text

Going deeper with convolutions

39552Citations
N/AReaders
Get full text

ImageNet classification with deep convolutional neural networks

23046Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A High-Performance Multimem SHA-256 Accelerator for Society 5.0

18Citations
N/AReaders
Get full text

MRSA: A High-Efficiency Multi ROMix Scrypt Accelerator for Cryptocurrency Mining and Data Security

8Citations
N/AReaders
Get full text

A novel systolic array processor with dynamic dataflows

7Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Iwamoto, J., Kikutani, Y., Zhang, R., & Nakashima, Y. (2020). Daisy-chained systolic array and reconfigurable memory space for narrow memory bandwidth. IEICE Transactions on Information and Systems, E103D(3), 578–589. https://doi.org/10.1587/transinf.2019EDP7144

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 4

100%

Readers' Discipline

Tooltip

Energy 1

33%

Computer Science 1

33%

Social Sciences 1

33%

Save time finding and organizing research with Mendeley

Sign up for free