Applied result detail

FITLayout Web Page Segmentation Framework

BURGET, R.; MILIČKA, M.

Original Title

FITLayout Web Page Segmentation Framework

English Title

FITLayout Web Page Segmentation Framework

Type

Software

Abstract

FitLayout is an extensible web page segmentation framework written in Java. It defines a generic Java API for representing a rendered web page and its division to visual areasand it provides a base for implementing page segmentation algorithms with a common application interface.As a sample segmentation method, it implements a previously published segmentation algorithm based onrecursive visual area merging and separator detection. The framework includes tools for post-processingthe segmentation result by different text or visual classification methods. Finally, it also provides tools for controlling the segmentation process and examining the segmentation results through a graphical user interface. The segmentation result may be stored as RDF data for later analysis.

Abstract in English

FitLayout is an extensible web page segmentation framework written in Java. It defines a generic Java API for representing a rendered web page and its division to visual areasand it provides a base for implementing page segmentation algorithms with a common application interface.As a sample segmentation method, it implements a previously published segmentation algorithm based onrecursive visual area merging and separator detection. The framework includes tools for post-processingthe segmentation result by different text or visual classification methods. Finally, it also provides tools for controlling the segmentation process and examining the segmentation results through a graphical user interface. The segmentation result may be stored as RDF data for later analysis.

Keywords

web page segmentation, document analysis, text classification, web page rendering

Key words in English

web page segmentation, document analysis, text classification, web page rendering

Location

http://www.fit.vutbr.cz/~burgetr/FITLayout/

Licence fee

In order to use the result by another entity, it is always necessary to acquire a license

www