About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

Conference paper

Auto-tuning Dense Vector and Matrix-vector Operations for Fermi GPUs

From

Department of Informatics and Mathematical Modeling, Technical University of Denmark1

Scientific Computing, Department of Informatics and Mathematical Modeling, Technical University of Denmark2

In this paper, we consider the automatic performance tuning of dense vector and matrix-vector operations on GPUs. Such operations form the backbone of level 1 and level 2 routines in the Basic Linear Algebra Subroutines (BLAS) library and are therefore of great importance in many scientific applications.

As examples, we develop single-precision CUDA kernels for the Euclidian norm (SNRM2) and the matrix-vector multiplication (SGEMV). The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture). We show that auto-tuning can be successfully applied to achieve high performance for dense vector and matrix-vector operations by appropriately utilizing the fine-grained parallelism of the GPU.

Our tuned kernels display between 25-100% better performance than the current CUBLAS 3.2 library.

Language: English
Publisher: Springer
Year: 2012
Pages: 619-629
Proceedings: Parallel Processing and Applied Mathematics. 9th International Conference, PPAM 2011
Series: Lecture Notes in Computer Science
Journal subtitle: 9th International Conference, Ppam 2011
ISBN: 3642314635 , 3642314643 , 9783642314636 and 9783642314643
ISSN: 03029743
Types: Conference paper
DOI: 10.1007/978-3-642-31464-3_63

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis