Detail publikačního výsledku

SoluProt: prediction of soluble protein expression in Escherichia coli

HON, J.; MARUŠIAK, M.; MARTÍNEK, T.; KUNKA, A.; ZENDULKA, J.; BEDNÁŘ, D.; DAMBORSKÝ, J.

Originální název

SoluProt: prediction of soluble protein expression in Escherichia coli

Anglický název

SoluProt: prediction of soluble protein expression in Escherichia coli

Druh

Článek WoS

Originální abstrakt

Motivation: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritisation of highly soluble proteins.
Results: A new tool for sequence-based prediction of soluble protein expression in Escherichia coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProts accuracy of 58.4% and AUC of 0.60 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/.

Anglický abstrakt

Motivation: Poor protein solubility hinders the production of many therapeutic and industrially useful proteins. Experimental efforts to increase solubility are plagued by low success rates and often reduce biological activity. Computational prediction of protein expressibility and solubility in Escherichia coli using only sequence information could reduce the cost of experimental studies by enabling prioritisation of highly soluble proteins.
Results: A new tool for sequence-based prediction of soluble protein expression in Escherichia coli, SoluProt, was created using the gradient boosting machine technique with the TargetTrack database as a training set. When evaluated against a balanced independent test set derived from the NESG database, SoluProts accuracy of 58.4% and AUC of 0.60 exceeded those of a suite of alternative solubility prediction tools. There is also evidence that it could significantly increase the success rate of experimental protein studies. SoluProt is freely available as a standalone program and a user-friendly webserver at https://loschmidt.chemi.muni.cz/soluprot/.

Klíčová slova

protein solubility, machine-learning

Klíčová slova v angličtině

protein solubility, machine-learning

Autoři

HON, J.; MARUŠIAK, M.; MARTÍNEK, T.; KUNKA, A.; ZENDULKA, J.; BEDNÁŘ, D.; DAMBORSKÝ, J.

Rok RIV

2022

Vydáno

01.01.2021

ISSN

1367-4803

Periodikum

BIOINFORMATICS

Svazek

37

Číslo

1

Stát

Spojené království Velké Británie a Severního Irska

Strany od

23

Strany do

28

Strany počet

6

URL

BibTex

@article{BUT168540,
  author="Jiří {Hon} and Martin {Marušiak} and Tomáš {Martínek} and Antonín {Kunka} and Jaroslav {Zendulka} and David {Bednář} and Jiří {Damborský}",
  title="SoluProt: prediction of soluble protein expression in Escherichia coli",
  journal="BIOINFORMATICS",
  year="2021",
  volume="37",
  number="1",
  pages="23--28",
  doi="10.1093/bioinformatics/btaa1102",
  issn="1367-4803",
  url="https://www.fit.vut.cz/research/publication/12368/"
}

Dokumenty