About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

Conference paper

High-Performance Matrix-Vector Multiplication on the GPU

From

Department of Informatics and Mathematical Modeling, Technical University of Denmark1

Scientific Computing, Department of Informatics and Mathematical Modeling, Technical University of Denmark2

In this paper, we develop a high-performance GPU kernel for one of the most popular dense linear algebra operations, the matrix-vector multiplication. The target hardware is the most recent Nvidia Tesla 20-series (Fermi architecture), which is designed from the ground up for scientific computing. We show that it is essentially a matter of fully utilizing the fine-grained parallelism of the many-core GPU in order to achieve high-performance for dense matrix-vector multiplication.

We show that auto-tuning can be successfully employed to the GPU kernel so that it performs well for all matrix shapes and sizes.

Language: English
Publisher: Springer
Year: 2012
Pages: 377-386
Proceedings: Euro-Par 2011
Series: Lecture Notes in Computer Science
ISBN: 3642297366 , 3642297374 , 9783642297366 and 9783642297373
ISSN: 03029743
Types: Conference paper
DOI: 10.1007/978-3-642-29737-3_42

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis