Publication result detail

HTML Document Analysis for Information Extraction

BURGET, R.

Original Title

HTML Document Analysis for Information Extraction

English Title

HTML Document Analysis for Information Extraction

Type

Paper in proceedings outside WoS and Scopus

Original Abstract

The today's World Wide Web contains a vast amount ofinformation stored in HTML documents. However, the HTML languageprimarily describes the look of the documents and it doesn't containfacilities for the description of contained data structure. In thispaper we propose a model of a Web site that describes logical structureof contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

English abstract

The today's World Wide Web contains a vast amount ofinformation stored in HTML documents. However, the HTML languageprimarily describes the look of the documents and it doesn't containfacilities for the description of contained data structure. In thispaper we propose a model of a Web site that describes logical structureof contained data. Furthermore, we propose methods for creating such a model by analyzing the look and the structure of HTML documents.

Keywords

HTML Analysis, Information Extraction

Key words in English

HTML Analysis, Information Extraction

Authors

BURGET, R.

RIV year

2011

Released

25.04.2002

Publisher

Faculty of Information Technology BUT

Location

Brno

ISBN

80-214-2116-9

Book

Proceedings of 8th EEICT conference

Pages from

426

Pages to

430

Pages count

5

BibTex

@inproceedings{BUT10014,
  author="Radek {Burget}",
  title="HTML Document Analysis for Information Extraction",
  booktitle="Proceedings of 8th EEICT conference",
  year="2002",
  pages="426--430",
  publisher="Faculty of Information Technology BUT",
  address="Brno",
  isbn="80-214-2116-9"
}