Course detail
Parallel Computations on GPU
FIT-PCGAcad. year: 2020/2021
The course covers the architecture and programming of graphics processing units by the NVidia and partially AMD. First, the architecture of GPUs is studied in detail. Then, the model of the program execution using hierarchical thread organisation and the SIMT model is discussed. Next, the memory hierarchy and synchronization techniques are described. After that, the course explains novel techniques of dynamic parallelism and data-flow processing concluded by practical usage of multi-GPU systems in environments with shared (NVLink) and distributed (MPI) memory. The second part of the course is devoted to high level programming techniques and libraries based on the OpenACC technology.
Language of instruction
Number of ECTS credits
Mode of study
Guarantor
Department
Learning outcomes of the course unit
Understanding of hardware limitations having impact on the efficiency of software solutions.
Prerequisites
Co-requisites
Planned learning activities and teaching methods
Assesment methods and criteria linked to learning outcomes
Exam prerequisites:
To get 20 out of 40 points for projects and midterm examination.
Course curriculum
Work placements
Aims
Specification of controlled education, way of implementation and compensation for absences
- Missed labs can be substituted in alternative dates.
- There will be a place for missed labs in the last week of the semester.
Recommended optional programme components
Prerequisites and corequisites
Basic literature
Recommended reading
Kirk, D., and Hwu, W.: Programming Massively Parallel Processors: A Hands-on Approach, Elsevier, 2010, s. 256, ISBN: 978-0-12-381472-2. download.
Nvidia CUDA documentation: https://docs.nvidia.com/cuda/ (EN)
OpenACC documentation: https://www.openacc.org/ (EN)
Storti,D., and Yurtoglu, M.: CUDA for Engineers: An Introduction to High-Performance Parallel Computing, Addison-Wesley Professional; 1 edition, 2015. ISBN 978-0134177410. link.
Classification of course in study plans
- Programme MITAI Master's
specialization NISY , 0 year of study, winter semester, elective
specialization NADE , 0 year of study, winter semester, elective
specialization NBIO , 0 year of study, winter semester, elective
specialization NCPS , 0 year of study, winter semester, elective
specialization NEMB , 0 year of study, winter semester, elective
specialization NHPC , 0 year of study, winter semester, compulsory
specialization NGRI , 0 year of study, winter semester, elective
specialization NIDE , 0 year of study, winter semester, elective
specialization NISD , 0 year of study, winter semester, elective
specialization NMAL , 0 year of study, winter semester, elective
specialization NMAT , 0 year of study, winter semester, elective
specialization NNET , 0 year of study, winter semester, elective
specialization NSEC , 0 year of study, winter semester, elective
specialization NSEN , 0 year of study, winter semester, elective
specialization NSPE , 0 year of study, winter semester, elective
specialization NVER , 0 year of study, winter semester, elective
specialization NVIZ , 0 year of study, winter semester, elective
Type of course unit
Lecture
Teacher / Lecturer
Syllabus
- Architecture of graphics processing units.
- CUDA programming model, tread execution.
- CUDA memory hierarchy.
- Synchronization and reduction.
- Dynamic parallelism and unified memory.
- Design and optimization of GPU algorithms.
- Stream processing, computation-communication overlapping.
- Multi-GPU systems.
- Nvidia Thrust library.
- OpenACC basics.
- OpenACC memory management.
- Code optimization with OpenACC.
- Libraries and tools for GPU programming.
Exercise in computer lab
Teacher / Lecturer
Syllabus
- CUDA: Memory transfers, simple kernels
- CUDA: Shared memory
- CUDA: Texture and constant memory
- CUDA: Dynamic parallelism and unified memory.
- OpenACC: basic techniques.
- OpenACC: advanced techniques.
Project
Teacher / Lecturer
Syllabus
- Development of an application in Nvidia CUDA
- Development of an application in OpenACC