This paper presents adaptive execution techniques that determine whether automatically parallelized loops are executed parallelly or sequentially in order to maximize performance and scalability. The adaptation and performance estimation algorithms are implemented in a compiler preprocessor. The preprocessor inserts code that automatically determines at compile-time or at run-time the way the parallelized loops are executed. Using a set of standard numerical applications written in Fortran77 and running them with our techniques on a distributed shared memory multiprocessor machine (SGI Origin2000), we obtain the performance of our techniques, on average, 26%, 20%, 16%, and 10% faster than the original parallel program on 32, 16, 8, and 4 processors, respectively. One of the applications runs even more than twice faster than its original parallel version on 32 processors. © Springer-Verlag Berlin Heidelberg 2005.
CITATION STYLE
Lee, J., & Moonesinghe, H. D. K. (2005). Adaptively increasing performance and scalability of automatically parallelized programs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2481 LNCS, pp. 203–217). Springer Verlag. https://doi.org/10.1007/11596110_14
Mendeley helps you to discover research relevant for your work.