Master's Thesis

Web Page Classification

Final Thesis 999.78 kB

Author of thesis: Ing. Roman Kolář

Acad. year: 2007/2008

Supervisor: Ing. Vladimír Bartík, Ph.D.

Reviewer: doc. Ing. Radek Burget, Ph.D.

Abstract:

This paper presents problem of automatic webpages classification using association rules based classifier. Classification problem is presented, as a one of  datamining technique, in context of mining knowledges from text data. There are many text document classification methods presented with highlighting benefits of classification methods using association rules.
The main goal of work is adjusting selected classification method for relation data and design draft of webpages classifier, which classifies pages with the aid of visual properties - independent section layout on the web page, not (only) by textual data. There is also ARC-BC classification method presented as a selected method and as one of intriguing classificators, that derives accuracy and understandableness benefits of all other methods.

Keywords:

classification, classificator, Web, datamining, association rule, precission, data, discretization, category, structure, attribute, support, confidence, text, interval

Date of defence

17.06.2008

Result of the defence

Defended (thesis was successfully defended)

znamkaAznamka

Grading

A

Language of thesis

Czech

Faculty

Department

Study programme

Information Technology (IT-MSC-2)

Field of study

Information Systems (MIS)

Supervisor’s report
Ing. Vladimír Bartík, Ph.D.

Grade proposed by supervisor: A

Reviewer’s report
doc. Ing. Radek Burget, Ph.D.

Grade proposed by reviewer: A

Responsibility: Mgr. et Mgr. Hana Odstrčilová