R&D Result Detail

Original Title

Learning document representations using subspace multinomial model

English Title

Learning document representations using subspace multinomial model

Type

Paper in proceedings (conference paper)

Original Abstract

Subspace multinomial model (SMM) is a log-linear model andcan be used for learning low dimensional continuous representationfor discrete data. SMMand its variants have been used forspeaker verification based on prosodic features and phonotacticlanguage recognition. In this paper, we propose a new variantof SMM that introduces sparsity and call the resulting modelas `1 SMM. We show that `1 SMM can be used for learningdocument representations that are helpful in topic identificationor classification and clustering tasks. Our experiments in documentclassification show that SMM achieves comparable resultsto models such as latent Dirichlet allocation and sparse topicalcoding, while having a useful property that the resulting documentvectors are Gaussian distributed.

English abstract

Subspace multinomial model (SMM) is a log-linear model andcan be used for learning low dimensional continuous representationfor discrete data. SMMand its variants have been used forspeaker verification based on prosodic features and phonotacticlanguage recognition. In this paper, we propose a new variantof SMM that introduces sparsity and call the resulting modelas `1 SMM. We show that `1 SMM can be used for learningdocument representations that are helpful in topic identificationor classification and clustering tasks. Our experiments in documentclassification show that SMM achieves comparable resultsto models such as latent Dirichlet allocation and sparse topicalcoding, while having a useful property that the resulting documentvectors are Gaussian distributed.

Keywords

Document representation, subspace modelling,topic identification, latent topic discovery

Key words in English

Document representation, subspace modelling,topic identification, latent topic discovery

Authors

KESIRAJU, S.; BURGET, L.; SZŐKE, I.; ČERNOCKÝ, J.

RIV year

2017

Released

08.09.2016

Publisher

International Speech Communication Association

Location

San Francisco

ISBN

978-1-5108-3313-5

Book

Proceedings of Interspeech 2016

Pages from

700

Pages to

704

Pages count

5

URL

https://www.researchgate.net/publication/307889473_Learning_Document_Representations_Using_Subspace_Multinomial_Model

BibTex

@inproceedings{BUT132598,
  author="Santosh {Kesiraju} and Lukáš {Burget} and Igor {Szőke} and Jan {Černocký}",
  title="Learning document representations using subspace multinomial model",
  booktitle="Proceedings of Interspeech 2016",
  year="2016",
  pages="700--704",
  publisher="International Speech Communication Association",
  address="San Francisco",
  doi="10.21437/Interspeech.2016-1634",
  isbn="978-1-5108-3313-5",
  url="https://www.researchgate.net/publication/307889473_Learning_Document_Representations_Using_Subspace_Multinomial_Model"
}

Documents

kesiraju_interspeech2016_IS161634

VUT

Faculties and university institutes

Parts

Learning document representations using subspace multinomial model