Comparative study of distributed deep learning tools on supercomputers

Xin Du; Di Kuang; Yan Ye; Xinxin Li; Mengqiang Chen; Yunfei Du; Weigang Wu

Conference Proceedings

Comparative study of distributed deep learning tools on supercomputers

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2018) 11334 LNCS 122-137

DOI: 10.1007/978-3-030-05051-1_9

5Citations

5Readers

Get full text

Abstract

With the growth of the scale of data set and neural networks, the training time is increasing rapidly. Distributed parallel training has been proposed to accelerate deep neural network training, and most efforts are made on top of GPU clusters. This paper focuses on the performance of distributed parallel training in CPU clusters of supercomputer systems. Using resources at the supercomputer system of “Tianhe-2”, we conduct extensive evaluation of the performance of popular deep learning tools, including Caffe, TensorFlow, and BigDL, and several deep neural network models are tested, including AutoEncoder, LeNet, AlexNet and ResNet. The experiment results show that Caffe performs the best in communication efficiency and scalability. BigDL is the fastest in computing speed benefiting from its optimization for CPU, but it suffers from long communication delay due to the dependency on MapReduce framework. The insights and conclusions from our evaluation provides significant reference for improving resource utility of supercomputer resources in distributed deep learning.

Author supplied keywords

Cite

CITATION STYLE

APA

Du, X., Kuang, D., Ye, Y., Li, X., Chen, M., Du, Y., & Wu, W. (2018). Comparative study of distributed deep learning tools on supercomputers. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 11334 LNCS, pp. 122–137). Springer Verlag. https://doi.org/10.1007/978-3-030-05051-1_9

Comparative study of distributed deep learning tools on supercomputers

Abstract

Author supplied keywords

Cite

Register to see more suggestions