Detail publikačního výsledku

Simplified Progressive Data Mining

STRYKA, L.; CHMELAŘ, P.

Originální název

Simplified Progressive Data Mining

Anglický název

Simplified Progressive Data Mining

Druh

Stať ve sborníku mimo WoS a Scopus

Originální abstrakt

There are huge amountsof data stored in databases, but it is very difficult to make decisions basedon this data. We propose the OLAM SE system (Self Explaining On-Line AnalyticalMining) that is similar to the Han's OLAM [5] in the idea of interactive datamining. The contribution is to simplify on-line analytical data mining to professionals,who understand their data but want more significant, interesting and usefulinformation. It is done by shielding internal concepts (associations,classifications, characterizations) and thresholds (supports, confidences) fromthe user and by a simple graphical interface that suggests most relevant items.

OLAM SE determines minimum support value fromrequired cover of data with usage of entropy coding principle. This isautomatically applied on the structure based on given conceptual hierarchywhere present. We also determine the maximum threshold to avoid explainingknowledge that is obvious. Major part of data is thus described by frequentpatterns.

The presentation of results is realized using diagramnotation similar to UML. In fact, it is a visual graph which nodes are frequentdata sets presented as packages including sub packages - data concepts oritems. Edges represent links or patterns between them. These patterns can be progressivelyexplored by the user, who gets a detailed view of patterns which are attractiveto him. Other possibly interesting sets are offered to the user without anyother action. This is well suitable for characterization and descriptive classificationequivalent to normal Bayes.

Anglický abstrakt

There are huge amountsof data stored in databases, but it is very difficult to make decisions basedon this data. We propose the OLAM SE system (Self Explaining On-Line AnalyticalMining) that is similar to the Han's OLAM [5] in the idea of interactive datamining. The contribution is to simplify on-line analytical data mining to professionals,who understand their data but want more significant, interesting and usefulinformation. It is done by shielding internal concepts (associations,classifications, characterizations) and thresholds (supports, confidences) fromthe user and by a simple graphical interface that suggests most relevant items.

OLAM SE determines minimum support value fromrequired cover of data with usage of entropy coding principle. This isautomatically applied on the structure based on given conceptual hierarchywhere present. We also determine the maximum threshold to avoid explainingknowledge that is obvious. Major part of data is thus described by frequentpatterns.

The presentation of results is realized using diagramnotation similar to UML. In fact, it is a visual graph which nodes are frequentdata sets presented as packages including sub packages - data concepts oritems. Edges represent links or patterns between them. These patterns can be progressivelyexplored by the user, who gets a detailed view of patterns which are attractiveto him. Other possibly interesting sets are offered to the user without anyother action. This is well suitable for characterization and descriptive classificationequivalent to normal Bayes.

Klíčová slova

On-line data mining,concept hierarchy, frequent patterns, cover, obviosity

Klíčová slova v angličtině

On-line data mining,concept hierarchy, frequent patterns, cover, obviosity

Autoři

STRYKA, L.; CHMELAŘ, P.

Vydáno

04.09.2007

Nakladatel

Wroclaw University of Technology

Místo

Wroclaw

ISBN

978-83-7493-340-7

Kniha

Proceedings of the 16th International Conference on Systems Science

Strany od

378

Strany do

387

Strany počet

10

BibTex

@inproceedings{BUT25331,
  author="Lukáš {Stryka} and Petr {Chmelař}",
  title="Simplified  Progressive  Data  Mining",
  booktitle="Proceedings of the 16th International Conference on Systems Science",
  year="2007",
  pages="378--387",
  publisher="Wroclaw University of Technology",
  address="Wroclaw",
  isbn="978-83-7493-340-7"
}