Optimization of BLAS on the cell processor

Vaibhav Saxena; Prashant Agrawal; Yogish Sabharwal; Vijay K. Garg; Vimitha A. Kuruvilla; John A. Gunnels

Conference Proceedings

Optimization of BLAS on the cell processor

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2008) 5374 LNCS 18-29

DOI: 10.1007/978-3-540-89894-8_6

5Citations

10Readers

Get full text

Abstract

The unique architecture of the heterogeneous multi-core Cell processor offers great potential for high performance computing. It offers features such as high memory bandwidth using DMA, user managed local stores and SIMD architecture. In this paper, we present strategies for leveraging these features to develop a high performance BLAS library. We propose techniques to partition and distribute data across SPEs for handling DMA efficiently. We show that suitable pre-processing of data leads to significant performance improvements when the data is unaligned. In addition, we use a combination of two kernels - a specialized high performance kernel for the more frequently occurring cases and a generic kernel for handling boundary cases - to obtain better performance. Using these techniques for double precision, we obtain up to 70-80% of peak performance for different memory bandwidth bound level 1 and 2 routines and up to 80-90% for computation bound level 3 routines. © 2008 Springer Berlin Heidelberg.

Author supplied keywords

Cite

CITATION STYLE

APA

Saxena, V., Agrawal, P., Sabharwal, Y., Garg, V. K., Kuruvilla, V. A., & Gunnels, J. A. (2008). Optimization of BLAS on the cell processor. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5374 LNCS, pp. 18–29). Springer Verlag. https://doi.org/10.1007/978-3-540-89894-8_6

Optimization of BLAS on the cell processor

Abstract

Author supplied keywords

Cite

Register to see more suggestions