Course detail
Data Storage and Preparation
FIT-UPAAcad. year: 2020/2021
The course focuses on modern database systems as typical data sources for knowledge discovery and further on the preparation of data for knowledge discovery. Discussed are extended relational (object-relational, with support for working with XML and JSON documents), spatial, and NoSQL database systems. The corresponding database model, the way of working with data and some methods of indexing are explained. In the context of the knowledge discovery process, attention is paid to the descriptive characteristics of data and visualization techniques used to data understanding. In addition, approaches to solving typical data pre-processing tasks for knowledge discovery, such as data cleaning, integration, transformation, reduction, etc. are explained. Approaches to information extraction from the web are also presented and several real case studies are presented. As a part of the course, students solve a project focused on ...
Language of instruction
Number of ECTS credits
Mode of study
Guarantor
Department
Learning outcomes of the course unit
- Student is better able to work with data in various situations.
- Student improves in solving small projects in a small team.
Prerequisites
- Fundamentals of relational databases and SQL.
- Object-oriented paradigm.
- Fundamentals of XML.
- Fundaments of computational geometry.
- Fundaments of statistics and probability.
Co-requisites
Planned learning activities and teaching methods
Assesment methods and criteria linked to learning outcomes
- Mid-term exam, for which there is only one schedule and, thus, there is no possibility to have another trial.
- One project should be solved and delivered in a given date during a term.
Exam prerequisites:
At the end of a term, a student should have at least 50% of points that he or she could obtain during the term; that means at least 20 points out of 40.
Plagiarism and not allowed cooperation will cause that involved students are not classified and disciplinary action can be initiated.
Course curriculum
Work placements
Aims
Specification of controlled education, way of implementation and compensation for absences
- Mid-term written exam, there is no resit, excused absences are solved by the guarantor.
- The formulation of the data mining task in the prescribed term, excused absences are solved by the assistent.
- The presentation of the project results in the prescribed term, excused absences are solved by the assistent.
- Final exam, The minimal number of points which can be obtained from the final exam is 20. Otherwise, no points will be assigned to the student. excused absences are solved by the guarantor.
Recommended optional programme components
Prerequisites and corequisites
Basic literature
Recommended reading
Classification of course in study plans
- Programme MITAI Master's
specialization NISY , 0 year of study, winter semester, compulsory
specialization NADE , 1 year of study, winter semester, compulsory
specialization NBIO , 1 year of study, winter semester, compulsory
specialization NCPS , 1 year of study, winter semester, compulsory
specialization NEMB , 0 year of study, winter semester, compulsory
specialization NHPC , 0 year of study, winter semester, compulsory
specialization NGRI , 0 year of study, winter semester, compulsory
specialization NIDE , 1 year of study, winter semester, compulsory
specialization NISD , 1 year of study, winter semester, compulsory
specialization NMAL , 1 year of study, winter semester, compulsory
specialization NMAT , 0 year of study, winter semester, compulsory
specialization NNET , 1 year of study, winter semester, compulsory
specialization NSEC , 0 year of study, winter semester, compulsory
specialization NSEN , 1 year of study, winter semester, compulsory
specialization NSPE , 1 year of study, winter semester, compulsory
specialization NVER , 0 year of study, winter semester, compulsory
specialization NVIZ , 1 year of study, winter semester, compulsory
Type of course unit
Lecture
Teacher / Lecturer
Syllabus
- History of database technology and knowledge discovery, process of knowledge discovery.
- Object-oriented approach in databases.
- NoSQL databases I - introduction to NoSQL, CAP theorem and BASE, key-value databases, data partitioning and distribution.
- NoSQL databases II -data models in NoSQL databases (column, document, and graph databases), querying and data aggregation, NewSQL databases.
- Web scraping.
- Data preparation - data understanding: descriptive characteristics, visualization techniques, correlation analysis.
- Data preparation - data pre-processing I: data cleaning and integration.
- Data preparation - data pre-processing II: data reduction, imbalanced data, data transformation, other data pre-processing tasks.
- Mid-term exam
- Languages and systems for knowledge discovery, real case studies.
- Support for working with XML and JSON documents in databases.
- Spatial databases.
- Indexing of multidimensional data.
Fundamentals seminar
Teacher / Lecturer
Syllabus
- Object-relational and spatial databases, data definition and manipulation, peculiarities
- Multimedia and XML databases, data indices
- NoSQL databases
Exercise in computer lab
Teacher / Lecturer
Syllabus
- Application binding to object-relational databases, application building in spatial databases
- Multimedia and XML databases, building and exploiting data indices
- NoSQL databases in applications
Project
Teacher / Lecturer
Syllabus
- Creation and feature demonstration of both structured and unstructured data processing, where data may be of various nature.