Course detail

Data Storage and Preparation

FIT-UPAAcad. year: 2019/2020

The course introduces fundamental data classification from the viewpoint of data mining and knowledge discovery. It also provides insight on selected modern database systems and particular topics are studied in deep manner --- there are presented object-relational databases, spatial databases (including issues connected with spatial data storage and indexing), NoSQL databases, XML databases, and multimedia databases. Moreover, advanced queries on relational databases are discussed too. Next, it is explained a process of data mining and knowledge discovery and particular steps of this process. The explanations is focused on typical tasks performed in data pre-processing before ongoing extraction of potentially useful knowledge from data. The process of data mining and knowledge discovery is presented on selected use-cases.

Language of instruction

Czech

Number of ECTS credits

5

Mode of study

Not applicable.

Learning outcomes of the course unit

Students will be able to classify data from data mining and knowledge discovery viewpoint, store and manipulate data in suitable database systems, quickly search for required data, inspect data features and prepare data for consecutive knowledge extraction.

- Student can better perform in data manipulation in various situations
- Student improves in participation on a small project as a member of a small team

Prerequisites

Fundamental relational data model theory. Formal design of relational database. Data storage on internal level. Data safety and integrity. Transactions. Conceptual modeling and database design from conceptual model. SQL programming language. Fundaments of computer graphics. Fundaments of computational geometry. Object paradigm. Fundaments of statistics and probability.

Co-requisites

Not applicable.

Planned learning activities and teaching methods

Not applicable.

Assesment methods and criteria linked to learning outcomes

  • Mid-term exam, for which there is only one schedule and, thus, there is no possibility to have another trial.
  • One project should be solved and delivered in a given date during a term.

Exam prerequisites:
At the end of a term, a student should have at least 50% of points that he or she could obtain during the term; that means at least 20 points out of 40.
Plagiarism and not allowed cooperation will cause that involved students are not classified and disciplinary action can be initiated.

Course curriculum

Not applicable.

Work placements

Not applicable.

Aims

The aim of the course is to explain fundamental data classification and classification of data resources, to give deeper insight on selected database systems (object-relational, spatial, NoSQL, XML, and multimedia) and efficient data manipulation, to provide core insight and particular steps on the process of data mining and knowledge discovery with concentration on data pre-processing and exploratory analysis.

Specification of controlled education, way of implementation and compensation for absences

  • Mid-term exam - written form, questions, where answers are given in full sentences, no possibility to have a second/alternative trial. (20 points)
  • Projects realization - 1 project (program development according to a given specification) with appropriate documentation. (20 points)
  • Final exam is performed in written form. Students are given questions, where answers are provided in full sentences. The maximal amount of points one can get is 60 points - the minimal number of points which must be obtained from the final exam is 25, otherwise, no points will be assigned to a student. The exam has one regular and two corrective periods. Regular period is always performed in fully written way only. Corrective periods can be performed either in fully written form or in a combined form (both written and verbal performance in a single day, written in the morning verbal in the afternoon). The form of corrective periods is announced as soon as the previous period is evaluated, while the combined form will be performed in the case when for the particular period is assigned no more than 16 students.

Recommended optional programme components

Not applicable.

Prerequisites and corequisites

Not applicable.

Basic literature

Not applicable.

Recommended reading

Not applicable.

Classification of course in study plans

  • Programme MITAI Master's

    specialization NGRI , any year of study, winter semester, compulsory
    specialization NSEC , any year of study, winter semester, compulsory
    specialization NEMB , any year of study, winter semester, compulsory
    specialization NHPC , any year of study, winter semester, compulsory
    specialization NISY , any year of study, winter semester, compulsory
    specialization NMAT , any year of study, winter semester, compulsory
    specialization NVER , any year of study, winter semester, compulsory
    specialization NADE , 1. year of study, winter semester, compulsory
    specialization NBIO , 1. year of study, winter semester, compulsory
    specialization NNET , 1. year of study, winter semester, compulsory
    specialization NVIZ , 1. year of study, winter semester, compulsory
    specialization NCPS , 1. year of study, winter semester, compulsory
    specialization NISD , 1. year of study, winter semester, compulsory
    specialization NIDE , 1. year of study, winter semester, compulsory
    specialization NMAL , 1. year of study, winter semester, compulsory
    specialization NSEN , 1. year of study, winter semester, compulsory
    specialization NSPE , 1. year of study, winter semester, compulsory

Type of course unit

 

Lecture

26 hours, optionally

Teacher / Lecturer

Syllabus

  1. Introduction: course contents, data characteristics, introduction to data mining and knowledge discovery, database technology development history recapitulation
  2. Object-relational DB, object-relational mapping, advanced SQL features.
  3. Spatial DB: spatial data storage and manipulation issues
  4. Spatial DB: possible solutions of spatial data storage
  5. Indexing in spatial DB I - points
  6. Indexing in spatial DB II - multi-dimensional objects
  7. Mid-term exam
  8. Multimedia and XML databases
  9. NoSQL databases
  10. Data mining and knowledge discovery process, data pre-processing in this process - data characteristics, exploratory data analysis
  11. Data pre-processing during data mining and knowledge discovery process - pre-processing methods
  12. Fundamental tasks in data mining and knowledge discovery, examples of corresponding methods
  13. Programming languages used for data mining and knowledge discovery, illustrative use-cases on data mining and knowledge discovery

Fundamentals seminar

6 hours, compulsory

Teacher / Lecturer

Syllabus

DEMO excercises
  1. Object-relational and spatial databases, data definition and manipulation, peculiarities
  2. Multimedia and XML databases, data indices
  3. NoSQL databases

Exercise in computer lab

6 hours, compulsory

Teacher / Lecturer

Syllabus

  1. Application binding to object-relational databases, application building in spatial databases
  2. Multimedia and XML databases, building and exploiting data indices
  3. NoSQL databases in applications

Project

14 hours, compulsory

Teacher / Lecturer

Syllabus

  1. Creation and feature demonstration of both structured and unstructured data processing, where data may be of various nature.