Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark

Thuong Cang Phan; Anh Cang Phan; Thi To Quyen Tran; Ngoan Thanh Trieu

Conference Proceedings

Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark

Advances in Intelligent Systems and Computing (2020) 1121 AISC 391-402

DOI: 10.1007/978-3-030-38364-0_35

N/ACitations

4Readers

Get full text

Abstract

MapReduce has become the dominant programming model for analyzing and processing large-scale data. However, the model has its own limitations. It does not completely support iterative computation, caching mechanism, and operations with multiple inputs. Besides, I/O and communication costs of the model are so expensive. One of the most notably complex operations extensively and expensively used in MapReduce is recursive joins. It requires processing characteristics that are the limitations of a MapReduce environment. Therefore, this research proposes efficient solutions for processing recursive joins in Spark, a next-generation data processing engine of MapReduce. Our proposal eliminates a large amount of redundant data generated in repeated join steps and takes advantages of in-memory computing means and cache mechanism. Through experiments, the present research shows that our solutions significantly improve the execution performance of recursive joins on large-scale datasets.

Author supplied keywords

Cite

CITATION STYLE

APA

Phan, T. C., Phan, A. C., Tran, T. T. Q., & Trieu, N. T. (2020). Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark. In Advances in Intelligent Systems and Computing (Vol. 1121 AISC, pp. 391–402). Springer. https://doi.org/10.1007/978-3-030-38364-0_35

Efficient Processing of Recursive Joins on Large-Scale Datasets in Spark

Abstract

Author supplied keywords

Cite

Register to see more suggestions