CLUS: Parallel subspace clustering algorithm on spark

17Citations
Citations of this article
12Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Subspace clustering techniques were proposed to discover hidden clusters that only exist in certain subsets of the full feature spaces. However, the time complexity of such algorithms is at most exponential with respect to the dimensionality of the dataset. In addition, datasets are generally too large to fit in a single machine under the current big data scenarios. The extremely high computational complexity, which results in poor scalability with respect to both size and dimensionality of these datasets, give us strong motivations to propose a parallelized subspace clustering algorithm able to handle large high dimensional data. To the best of our knowledge, there are no other parallel subspace clustering algorithms that run on top of new generation big data distributed platforms such as MapReduce and Spark. In this paper we introduce CLUS: a novel parallel solution of subspace clustering based on SUBCLU algorithm. CLUS uses a new dynamic data partitioning method specifically designed to continuously optimize the varying size and content of required data for each iteration in order to fully take advantage of Spark’s in-memory primitives. This method minimizes communication cost between nodes, maximizes their CPU usage, and balances the load among them. Consequently the execution time is significantly reduced. Finally, we conduct several experiments with a series of real and synthetic datasets to demonstrate the scalability, accuracy and the nearly linear speedup with respect to number of nodes of the implementation.

Cite

CITATION STYLE

APA

Zhu, B., Mara, A., & Mozo, A. (2015). CLUS: Parallel subspace clustering algorithm on spark. In Communications in Computer and Information Science (Vol. 539, pp. 175–185). Springer Verlag. https://doi.org/10.1007/978-3-319-23201-0_20

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Save time finding and organizing research with Mendeley

Sign up for free