Detail publikačního výsledku

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

HLOSTA, M.; STRÍŽ, R.; KUPČÍK, J.; ZENDULKA, J.; HRUŠKA, T.

Originální název

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Anglický název

Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm

Druh

Článek recenzovaný mimo WoS a Scopus

Originální abstrakt

Imbalance in data classification is a frequently discussedproblem that is not well handled by classical classification techniques. Theproblem we tackled was to learn binary classification model from large data withaccuracy constraint for the minority class. We propose a new meta-learningmethod that creates initial models using cost-sensitive learning by logisticregression and uses these models as initial chromosomes for genetic algorithm. Themethod has been successfully tested on a large real-world data set from ourinternet security research. Experiments prove that our method always leads tobetter results than usage of logistic regression or genetic algorithm alone. Moreover,this method produces easily understandable classification model.

Anglický abstrakt

Imbalance in data classification is a frequently discussedproblem that is not well handled by classical classification techniques. Theproblem we tackled was to learn binary classification model from large data withaccuracy constraint for the minority class. We propose a new meta-learningmethod that creates initial models using cost-sensitive learning by logisticregression and uses these models as initial chromosomes for genetic algorithm. Themethod has been successfully tested on a large real-world data set from ourinternet security research. Experiments prove that our method always leads tobetter results than usage of logistic regression or genetic algorithm alone. Moreover,this method produces easily understandable classification model.

Klíčová slova

Imbalanced data, classification, genetic algorithm, logistic regression

Klíčová slova v angličtině

Imbalanced data, classification, genetic algorithm, logistic regression

Autoři

HLOSTA, M.; STRÍŽ, R.; KUPČÍK, J.; ZENDULKA, J.; HRUŠKA, T.

Rok RIV

2014

Vydáno

18.05.2013

ISSN

2010-3700

Periodikum

International Journal of Machine Learning and Computing

Svazek

2013

Číslo

3

Stát

Singapurská republika

Strany od

214

Strany do

218

Strany počet

5

URL

BibTex

@article{BUT103468,
  author="Martin {Hlosta} and Rostislav {Stríž} and Jan {Kupčík} and Jaroslav {Zendulka} and Tomáš {Hruška}",
  title="Constrained Classification of Large Imbalanced Data by Logistic Regression and Genetic Algorithm",
  journal="International Journal of Machine Learning and Computing",
  year="2013",
  volume="2013",
  number="3",
  pages="214--218",
  issn="2010-3700",
  url="http://www.ijmlc.org/index.php?m=content&c=index&a=show&catid=36&id=304"
}

Dokumenty