Detail publikačního výsledku

SoluProt: Prediction of Protein Solubility

HON, J.; MARUŠIAK, M.; MARTÍNEK, T.; ZENDULKA, J.; BEDNÁŘ, D.; DAMBORSKÝ, J.

Originální název

SoluProt: Prediction of Protein Solubility

Anglický název

SoluProt: Prediction of Protein Solubility

Druh

Stať ve sborníku mimo WoS a Scopus

Originální abstrakt

Protein solubility poses a major bottleneck in productionof many therapeutic and industrially attractive proteins. Experimentalsolubilization attempts are plagued by relatively low success rates andoften lead to the loss of biological activity. Therefore, any advance incomputational prediction of protein solubility may reduce the cost of experimentalstudies significantly. Here, we propose a novel software toolSoluProt for prediction of solubility from protein sequence based on machinelearning and TargetTrack database. SoluProt achieved the bestaccuracy 58.2% and AUC 0.61 of all available tools at an independentbalanced test set derived from NESG database. While the absolute predictionperformance is rather low, SoluProt can still help to reduce costsof experimental studies significantly by efficient prioritization of proteinsequences. The main SoluProt contribution lies in improved preprocessingof noisy training data and sensible selection of sequence featuresincluded in the prediction model.

Anglický abstrakt

Protein solubility poses a major bottleneck in productionof many therapeutic and industrially attractive proteins. Experimentalsolubilization attempts are plagued by relatively low success rates andoften lead to the loss of biological activity. Therefore, any advance incomputational prediction of protein solubility may reduce the cost of experimentalstudies significantly. Here, we propose a novel software toolSoluProt for prediction of solubility from protein sequence based on machinelearning and TargetTrack database. SoluProt achieved the bestaccuracy 58.2% and AUC 0.61 of all available tools at an independentbalanced test set derived from NESG database. While the absolute predictionperformance is rather low, SoluProt can still help to reduce costsof experimental studies significantly by efficient prioritization of proteinsequences. The main SoluProt contribution lies in improved preprocessingof noisy training data and sensible selection of sequence featuresincluded in the prediction model.

Klíčová slova

protein, solubility, prediction, machine-learning

Klíčová slova v angličtině

protein, solubility, prediction, machine-learning

Autoři

HON, J.; MARUŠIAK, M.; MARTÍNEK, T.; ZENDULKA, J.; BEDNÁŘ, D.; DAMBORSKÝ, J.

Rok RIV

2019

Vydáno

17.08.2018

Nakladatel

Brno University of Technology

Místo

Brno

ISBN

978-80-214-5679-2

Kniha

DAZ & WIKT 2018 Proceedings

Strany od

261

Strany do

265

Strany počet

5

URL

BibTex

@inproceedings{BUT155085,
  author="Jiří {Hon} and Martin {Marušiak} and Tomáš {Martínek} and Jaroslav {Zendulka} and David {Bednář} and Jiří {Damborský}",
  title="SoluProt: Prediction of Protein Solubility",
  booktitle="DAZ & WIKT 2018 Proceedings",
  year="2018",
  pages="261--265",
  publisher="Brno University of Technology",
  address="Brno",
  isbn="978-80-214-5679-2",
  url="https://www.fit.vut.cz/research/publication/11808/"
}

Dokumenty