Submission 3

Submission 3

Intrinsically disordered proteins (IDPs) act as hubs in interaction networks and play key roles in signaling pathways due to their conformational flexibility coupled with the tendency to expose short linear peptides (i.e. functional motifs). Consequently, dysregulation of IDPs emerges as a critical element for many diseases, including cancer, metabolic disorders, and diseases of protein homeostasis. However, systemic association between IDPs and disease phenotypes is, so far, poorly investigated. Human Phenotype Ontology (HPO) organizes human diseases and associated genes into hierarchical classes based on the phenotypes they present. Computational predictions of gene-HPO relationships have great potential to accelerate the pace of discovery and to prioritize candidate disease genes. In this work we evaluated ensemble based model in which sequences are encoded by amino acid scale that captures features specific for IDPs given that the disordered proteins or regions are characterized by significantly different amino acid sequence composition from those characteristic for the ordered proteins. We developed a model which exploits the intrinsic features of this particular protein class and demonstrated that it outperforms proteome-wide baseline methods on task of prediction of association of IDP encoding genes and HPO.  Acknowledgments: This work was supported by the Ministry of Education, Science and Technological Development of Republic of Serbia (Grant No. 173001). The author(s) would like to acknowledge the contribution of the COST Action BM1405.