Detail publikačního výsledku

Automatic Web Document Restructuring Based on Visual Information Analysis

BURGET, R.

Originální název

Automatic Web Document Restructuring Based on Visual Information Analysis

Anglický název

Automatic Web Document Restructuring Based on Visual Information Analysis

Druh

Stať ve sborníku v databázi WoS či Scopus

Originální abstrakt

Many documents available on the current web have quite a complex structure that allows to present various kinds of information. Apart from the main content, the documents usually contain headers and footers, navigation sections and other types of additional information. For many applications such as document indexing or browsing on special devices, it is desirable that the main document information should precede the additional information in the underlying HTML code. In this paper, we propose a method of document preprocessing that automatically restructures the document code according to this criteria. Our method is based on rendered document analysis. A page segmentation algorithm is used for detecting the basic blocks on the page and the relevance of the individual parts is estimated from the visual properties of the text content.

Anglický abstrakt

Many documents available on the current web have quite a complex structure that allows to present various kinds of information. Apart from the main content, the documents usually contain headers and footers, navigation sections and other types of additional information. For many applications such as document indexing or browsing on special devices, it is desirable that the main document information should precede the additional information in the underlying HTML code. In this paper, we propose a method of document preprocessing that automatically restructures the document code according to this criteria. Our method is based on rendered document analysis. A page segmentation algorithm is used for detecting the basic blocks on the page and the relevance of the individual parts is estimated from the visual properties of the text content.

Klíčová slova

document restructuring, page analysis, page segmentation, block importance

Klíčová slova v angličtině

document restructuring, page analysis, page segmentation, block importance

Autoři

BURGET, R.

Rok RIV

2016

Vydáno

20.01.2010

Nakladatel

Springer Verlag

Místo

Prague

ISBN

978-3-642-10686-6

Kniha

Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009

Edice

Advances in Intelligent and Soft Computing , Vol. 67

Strany od

61

Strany do

70

Strany počet

10

BibTex

@inproceedings{BUT30224,
  author="Radek {Burget}",
  title="Automatic Web Document Restructuring Based on Visual Information Analysis",
  booktitle="Advances in Intelligent Web Mastering - 2, Proceedings of the 6th Atlantic Web Intelligence Conference - AWIC'2009",
  year="2010",
  series="Advances in Intelligent and Soft Computing , Vol. 67",
  pages="61--70",
  publisher="Springer Verlag",
  address="Prague",
  doi="10.1007/978-3-642-10687-3\{_}6",
  isbn="978-3-642-10686-6"
}