Submission 12

Submission 12

Many important problems in BioCuration can be modeled as a large scale multi-label learning problem, such as MeSH indexing and protein function prediction. By utilizing learning to rank framework, we have developed MeSHLabeler and DeepMeSH to solve large-scale MeSH indexing problem, and GOLabeler for protein function prediction. DeepMeSH achieved the first place in both BioASQ4 and BioASQ5 challenge, and MeSHLabeler achieved the first place in both BioASQ2 and BioASQ3 challenges. Specifically, DeepMeSH achieved a Micro F-measure of 0.6323, 2% higher than 0.6218 of MeSHLabeler and 12% higher than 0.5637 of MTI (NLM's official solution), for BioASQ3 challenge data with 6000 citations. on the other hand, the empirical results obtained by examining GOLabeler extensively and thoroughly by using large-scale datasets revealed numerous favorable aspects of GOLabeler, including significant performance advantage over state-of-the-art AFP (Automated Function Prediction) methods. According to the initial evaluation of CAFA3 (The Critical Assessment of protein Function Annotation algorithms) in July 2017, GOLabeler achieved the first place in terms of F-max out of around 200 submissions by around 50 labs all over the world.