About

Log in?

DTU users get better search results including licensed content and discounts on order fees.

Anyone can log in and get personalized features such as favorites, tags and feeds.

Log in as DTU user Log in as non-DTU user No thanks

DTU Findit

Journal article · Conference paper

KinMutRF: a random forest classifier of sequence variants in the human protein kinase superfamily

From

Spanish National Cancer Research Centre1

Technical University of Denmark2

Department of Systems Biology, Technical University of Denmark3

Integrative Systems Biology, Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark4

Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark5

Background: The association between aberrant signal processing by protein kinases and human diseases such as cancer was established long time ago. However, understanding the link between sequence variants in the protein kinase superfamily and the mechanistic complex traits at the molecular level remains challenging: cells tolerate most genomic alterations and only a minor fraction disrupt molecular function sufficiently and drive disease.

Results: KinMutRF is a novel random-forest method to automatically identify pathogenic variants in human kinases. Twenty six decision trees implemented as a random forest ponder a battery of features that characterize the variants: a) at the gene level, including membership to a Kinbase group and Gene Ontology terms; b) at the PFAM domain level; and c) at the residue level, the types of amino acids involved, changes in biochemical properties, functional annotations from UniProt, Phospho.ELM and FireDB.

KinMutRF identifies disease-associated variants satisfactorily (Acc: 0.88, Prec:0.82, Rec:0.75, F-score:0.78, MCC:0.68) when trained and cross-validated with the 3689 human kinase variants from UniProt that have been annotated as neutral or pathogenic. All unclassified variants were excluded from the training set.

Furthermore, KinMutRF is discussed with respect to two independent kinase-specific sets of mutations no included in the training and testing, Kin-Driver (643 variants) and Pon-BTK (1495 variants). Moreover, we provide predictions for the 848 protein kinase variants in UniProt that remained unclassified.

A public implementation of KinMutRF, including documentation and examples, is available online (http://kinmut2.bioinfo.cnio.es). The source code for local installation is released under a GPL version 3 license, and can be downloaded from https://github.com/Rbbt-Workflows/KinMut2. Conclusions: KinMutRF is capable of classifying kinase variation with good performance.

Predictions by KinMutRF compare favorably in a benchmark with other state-of-the-art methods (i.e. SIFT, Polyphen-2, MutationAssesor, MutationTaster, LRT, CADD, FATHMM, and VEST). Kinase-specific features rank as the most elucidatory in terms of information gain and are likely the improvement in prediction performance.

This advocates for the development of family-specific classifiers able to exploit the discriminatory power of features unique to individual protein families.

Language: English
Publisher: BioMed Central
Year: 2016
Pages: 396
Proceedings: VarI-SIG at ISMB 2015
ISSN: 14712164
Types: Journal article and Conference paper
DOI: 10.1186/s12864-016-2723-1
ORCIDs: 0000-0003-0316-5866 and Gonzalez-Izarzugaza, Jose Maria

DTU users get better search results including licensed content and discounts on order fees.

Log in as DTU user

Access

Analysis