Course detail
Parallel Data Processing
FEKT-MPA-PZPAcad. year: 2023/2024
Parallelization using CPU. Parallelization using GPU (matrix operations, deep learning algorithms). Technologies: Apache Spark, Hadoop, Kafka, Cassandra. Distributed computations for operations: data transformation, aggregation, classification, regression, clustering, frequent patterns, optimization. Data streaming – basic operations, state operations, monitoring. Further technologies for distributed computations.
Language of instruction
English
Number of ECTS credits
6
Mode of study
Not applicable.
Guarantor
Department
Entry knowledge
Not applicable.
Rules for evaluation and completion of the course
final exam
The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.
The content and forms of instruction in the evaluated course are specified by a regulation issued by the lecturer responsible for the course and updated for every academic year.
Aims
The goal of the course is to introduce parallelization for data analysis with using common processors, graphic processors and distributed systems.
Students have skills of design and implementation of various forms of parallel systems to solve big data challenge. They will learn techniques for the parallelization of computations using CPU and GPU and further they will learn techniques for distributed computations. Students will control technologies Apache Spark, Kafka, Cassandra to solve distributed data processing with using data operations: data transformations, aggregation, classification, regression, clustering, frequent patterns.
Students have skills of design and implementation of various forms of parallel systems to solve big data challenge. They will learn techniques for the parallelization of computations using CPU and GPU and further they will learn techniques for distributed computations. Students will control technologies Apache Spark, Kafka, Cassandra to solve distributed data processing with using data operations: data transformations, aggregation, classification, regression, clustering, frequent patterns.
Study aids
Not applicable.
Prerequisites and corequisites
Not applicable.
Basic literature
Dasgupta, Nataraj. "Practical big data analytics: Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R." (2018) (EN)
Recommended reading
BARLAS, Gerassimos. Multicore and gpu programming: an integrated approach. ISBN 9780124171374 (EN)
Elearning
eLearning: currently opened course
Classification of course in study plans
- Programme MPC-TIT Master's 0 year of study, winter semester, compulsory-optional
- Programme MPAD-CAN Master's 2 year of study, winter semester, compulsory
- Programme MPA-EAK Master's 0 year of study, winter semester, compulsory-optional
- Programme MPAD-CAN Master's 2 year of study, winter semester, compulsory
Type of course unit
Elearning
eLearning: currently opened course