Exoshuffle: An Extensible Shuffle Architecture

3Citations
Citations of this article
20Readers
Mendeley users who have this article in their library.

Abstract

Shuffle is one of the most expensive communication primitives in distributed data processing and is difficult to scale. Prior work addresses the scalability challenges of shuffle by building monolithic shuffle systems. These systems are costly to develop, and they are tightly integrated with batch processing frameworks that offer only high-level APIs such as SQL. New applications, such as ML training, require more flexibility and finer-grained interoperability with shuffle. They are often unable to leverage existing shuffle optimizations.We propose an extensible shuffle architecture. We present Exoshuffle, a library for distributed shuffle that offers competitive performance and scalability as well as greater flexibility than monolithic shuffle systems. We design an architecture that decouples the shuffle control plane from the data plane without sacrificing performance. We build Exoshuffle on Ray, a distributed futures system for data and ML applications, and demonstrate that we can: (1) rewrite previous shuffle optimizations as application-level libraries with an order of magnitude less code, (2) achieve shuffle performance and scalability competitive with monolithic shuffle systems, and break the CloudSort record as the world's most cost-efficient sorting system, and (3) enable new applications such as ML training to easily leverage scalable shuffle.

References Powered by Scopus

MapReduce: Simplified data processing on large clusters

11907Citations
N/AReaders
Get full text

OpenFlow: Enabling Innovation in Campus Networks

7329Citations
N/AReaders
Get full text

Apache spark: A unified engine for big data processing

1914Citations
N/AReaders
Get full text

Cited by Powered by Scopus

Parallel approaches for a decision tree-based explainability algorithm

3Citations
N/AReaders
Get full text

Unifying serverless and microservice workloads with SigmaOS

0Citations
N/AReaders
Get full text

Towards Resource Efficiency: Practical Insights into Large-Scale SparkWorkloads at ByteDance

0Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Luan, F. S., Wang, S., Yagati, S., Kim, S., Lien, K., Ong, I., … Stoica, I. (2023). Exoshuffle: An Extensible Shuffle Architecture. In SIGCOMM 2023 - Proceedings of the ACM SIGCOMM 2023 Conference (pp. 564–577). Association for Computing Machinery, Inc. https://doi.org/10.1145/3603269.3604848

Readers over time

‘23‘24‘25036912

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 8

100%

Readers' Discipline

Tooltip

Computer Science 8

80%

Engineering 2

20%

Save time finding and organizing research with Mendeley

Sign up for free
0