Course detail

Parallel Computations on GPU

FIT-PCGAcad. year: 2025/2026

The course covers the architecture and programming of graphics processing units by the NVidia and partially AMD. First, the architecture of GPUs is studied in detail. Then, the model of the program execution using hierarchical thread organisation and the SIMT model is discussed. Next, the memory hierarchy and synchronization techniques are described. After that, the course explains novel techniques of dynamic parallelism and data-flow processing concluded by practical usage of multi-GPU systems in environments with shared (NVLink) and distributed (MPI) memory. The second part of the course is devoted to high level programming techniques and libraries based on the OpenACC technology.

Language of instruction

Czech

Number of ECTS credits

Mode of study

Not applicable.

Guarantor

prof. Ing. Jiří Jaroš, Ph.D.

Department

Department of Computer Systems (UPSY)

Entry knowledge

Knowledge gained in courses AVS and partially in PRL and PPP.

Rules for evaluation and completion of the course

Assessment of two projects, 14 hours in total and, computer laboratories and a midterm examination.

Missed labs can be substituted in alternative dates.

Aims

To familiarize yourself with the architecture and programming of graphics processing unit in the area of general purpose computuing using the NVidia libraries and OpenACC standard. To learn how to design and implement accelerated programs exploiting the potential of GPUs. To gain knowledge about the available libraries for programming on GPUs.
Knowledge of the parallel programming on GPUs in the area of general purpose computing, orientation in the area of accelerated systems, libraries and tools.
Understanding of hardware limitations having impact on the efficiency of software solutions.

Study aids

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Recommended reading

Aktuální PPT prezentace přednášek v Elearningu. (CS)
Current PPT slides for lectures (EN)
Dokumentace Nvidia: https://docs.nvidia.com/cuda/ (CS)
Dokumentace OpenACC: https://www.openacc.org/ (CS)
Chandrasekaran, S., and Juckeland, G.: OpenACC for Programmers: Concepts and Strategies, Addison-Wesley Professional, 2017, ISBN 978-0134694283. link.
Kirk, D., and Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach, Elsevier, 2010, s. 256, ISBN: 978-0-12-381472-2. download.
Nvidia CUDA documentation: https://docs.nvidia.com/cuda/ (EN)
OpenACC documentation: https://www.openacc.org/ (EN)
Sanders, J., & Kandrot, E: CUDA by Example: An Introduction to General-Purpose GPU Programming. Review Literature And Arts Of The Americas. Addison-Wesley, 2010. download.
Storti,D., and Yurtoglu, M.: CUDA for Engineers: An Introduction to High-Performance Parallel Computing, Addison-Wesley Professional; 1 edition, 2015. ISBN 978-0134177410. link.

Elearning

eLearning: currently opened course

Classification of course in study plans

Programme MITAI Master's
specialization NSEC , 0 year of study, winter semester, elective
specialization NNET , 0 year of study, winter semester, elective
specialization NMAL , 0 year of study, winter semester, elective
specialization NCPS , 0 year of study, winter semester, elective
specialization NHPC , 0 year of study, winter semester, compulsory, profile core courses
specialization NVER , 0 year of study, winter semester, elective
specialization NIDE , 0 year of study, winter semester, elective
specialization NISY , 0 year of study, winter semester, elective
specialization NEMB , 0 year of study, winter semester, elective
specialization NSPE , 0 year of study, winter semester, elective
specialization NEMB , 0 year of study, winter semester, elective
specialization NBIO , 0 year of study, winter semester, elective
specialization NSEN , 0 year of study, winter semester, elective
specialization NVIZ , 0 year of study, winter semester, elective
specialization NGRI , 0 year of study, winter semester, elective
specialization NADE , 0 year of study, winter semester, elective
specialization NISD , 0 year of study, winter semester, elective
specialization NMAT , 0 year of study, winter semester, elective

Type of course unit

Lecture

26 hours, optionally

Teacher / Lecturer

prof. Ing. Jiří Jaroš, Ph.D.
Ing. Oliver Kuník

Syllabus

Architecture and history of graphics processing units.
CUDA programming model, tread execution.
CUDA memory hierarchy.
Matrix multiplication and stencil computing
Case studies of GPGPU algorithms.
Synchronization, reduction and prefix scan.
Dynamic parallelism and unified memory.
Stream processing, computation-communication overlapping.
Multi-GPU systems.
OpenACC library
Libraries and tools for GPU programming (OpenCL, HIP, OpenMP).

Exercise in computer lab

12 hours, optionally

Teacher / Lecturer

Ing. Oliver Kuník

Syllabus

CUDA: Memory transfers, simple kernels
CUDA: Shared memory
CUDA: Texture and constant memory
CUDA: Dynamic parallelism and unified memory.
OpenACC: basic techniques.
OpenACC: advanced techniques.

Project

14 hours, compulsory

Teacher / Lecturer

Ing. Oliver Kuník

Syllabus

Development of an application in Nvidia CUDA
Development of an application in OpenACC

Elearning

eLearning: currently opened course

VUT

Faculties and university institutes

Parts

Parallel Computations on GPU

Type of course unit