A High-Throughput, Resource-Efficient Implementation of the RoCEv2 Remote DMA Protocol and its Application

3Citations
Citations of this article
11Readers
Mendeley users who have this article in their library.

Abstract

The use of application-specific accelerators in data centers has been the state of the art for at least a decade, starting with the availability of General Purpose GPUs achieving higher performance either overall or per watt. In most cases, these accelerators are coupled via PCIe interfaces to the corresponding hosts, which leads to disadvantages in interoperability, scalability and power consumption. As a viable alternative to PCIe-attached FPGA accelerators this paper proposes standalone FPGAs as Network-attached Accelerators (NAAs). To enable reliable communication for decoupled FPGAs we present an RDMA over Converged Ethernet v2 (RoCEv2) communication stack for high-speed and low-latency data transfer integrated into a hardware framework.For NAAs to be used instead of PCIe coupled FPGAs the framework must provide similar throughput and latency with low resource usage. We show that our RoCEv2 stack is capable of achieving 100 Gb/s throughput with latencies of less than 4μs while using about 10% of the available resources on a mid-range FPGA. To evaluate the energy efficiency of our NAA architecture, we built a demonstrator with 8 NAAs for machine learning based image classification. Based on our measurements, network-attached FPGAs are a great alternative to the more energy-demanding PCIe-attached FPGA accelerators.

References Powered by Scopus

ImageNet Large Scale Visual Recognition Challenge

30431Citations
N/AReaders
Get full text

A cloud-scale acceleration architecture

519Citations
N/AReaders
Get full text

StRoM: Smart remote memory

82Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A Lossless-Ethernet-Based Interconnect for FPGA Clusters Toward FTQC

0Citations
N/AReaders
Get full text

Enabling Communication with FPGA-based Network-attached Accelerators for HPC Workloads

0Citations
N/AReaders
Get full text

FPGA-Based Network-Attached Accelerators – An Environmental Life Cycle Perspective

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Schelten, N., Steinert, F., Knapheide, J., Schulte, A., & Stabernack, B. (2022). A High-Throughput, Resource-Efficient Implementation of the RoCEv2 Remote DMA Protocol and its Application. ACM Transactions on Reconfigurable Technology and Systems, 16(1). https://doi.org/10.1145/3543176

Readers over time

‘23‘24‘2502468

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 6

86%

Lecturer / Post doc 1

14%

Readers' Discipline

Tooltip

Computer Science 6

67%

Engineering 3

33%

Save time finding and organizing research with Mendeley

Sign up for free
0