Submission 24

Submission 24

Perpetual efforts in understanding the complex biological processes/agro-economic plant traits at different lengths and scales lead to data generation at unprecedented rates today. Moreover, with each new technological advancement this data generation is spurred on even further. Researchers get access to such highly valuable experimental data mostly via peer-reviewed scientific literature in biological sciences. However, all these published articles are written and made available in a format, which is neither machine-readable nor uniformly structured. Therefore, management of this data necessitates the development of systematic protocols to extract the embedded information into computer-indexible format, which can subsequently be used by computational algorithms. MCDRP (Manually Curated Database of Rice Proteins; www.genomeindia.org/biocuration) addresses these issues of data curation. The database exploits in-house developed data curation models that enable digitization of the experimental data itself. All different aspects of every data point in an experiment (graphical or pictorial depiction) are digitized and can thus be accessed by means of simple database search. The use of universal ontologies and other standard notations provide the semantic structure that supports integration and comparison of large, complex and cross-linked data sets in MCDRP. Digitization and integration of data from over 9000 different experiments contained in more than 500 research articles translated experimental assays into ‘Trait Ontology’ (TO) and Gene Ontology (GO) annotations. Co-functional networks can now be drawn by analyzing proteins that share a common ‘Trait’ or ‘Biological Process’. Among the 394 trait-associated and 1234 GO-associated proteins, physical interaction data has been digitized for 76 proteins in MCDRP. Probabilistic functional gene networks can be drawn by concatenation of the digitized data supporting protein-trait/protein-GO links and protein-protein interaction data. Analysis of these networks indicate several putative and yet unknown functional associations between rice proteins. This biocuration endeavor can lead to hypothesis driven studies which can be tested and/or validated, eventually discovering function(s) for previously uncharacterized genes.