Přístupnostní navigace
E-přihláška
Vyhledávání Vyhledat Zavřít
Detail publikačního výsledku
BURGET, R.
Originální název
Layout Based Information Extraction from HTML Documents
Anglický název
Druh
Stať ve sborníku mimo WoS a Scopus
Originální abstrakt
We propose a method of information extraction from HTML documents based on modelling the visual information in the document. A page segmentation algorithm is used for detecting the document layout and subsequently, the extraction process is based on the analysis of mutual positions of the detected blocks and their visual features. This approach is more robust that the traditional DOM-based methods and it opens new possibilities for the extraction task specification.
Anglický abstrakt
Klíčová slova
page segmentation, layout analysis, information extraction
Klíčová slova v angličtině
Autoři
Vydáno
23.09.2007
Nakladatel
IEEE Computer Society
Místo
Curitiba
ISBN
0-7695-2822-8
Kniha
9th International Conference on Document Analysis and Recognition ICDAR 2007
Strany od
624
Strany do
629
Strany počet
6
BibTex
@inproceedings{BUT28821, author="Radek {Burget}", title="Layout Based Information Extraction from HTML Documents", booktitle="9th International Conference on Document Analysis and Recognition ICDAR 2007", year="2007", pages="624--629", publisher="IEEE Computer Society", address="Curitiba", isbn="0-7695-2822-8" }