SnoReport 2.0: new features and a refined Support Vector Machine to improve snoRNA identification

Background snoReport uses RNA secondary structure prediction combined with machine learning as the basis to identify the two main classes of small nucleolar RNAs, the box H/ACA snoRNAs and the box C/D snoRNAs. Here, we present snoReport 2.0, which substantially improves and extends in the original method by: extracting new features for both box C/D and H/ACA box snoRNAs; developing a more sophisticated technique in the SVM training phase with recent data from vertebrate organisms and a careful choice of the SVM parameters C and γ; and using updated versions of tools and databases used for the construction of the original version of snoReport. To validate the new version and to demonstrate its improved performance, we tested snoReport 2.0 in different organisms. Results Results of the training and test phases of boxes H/ACA and C/D snoRNAs, in both versions of snoReport, are discussed. Validation on real data was performed to evaluate the predictions of snoReport 2.0. Our program was applied to a set of previously annotated sequences, some of them experimentally confirmed, of humans, nematodes, drosophilids, platypus, chickens and leishmania. We significantly improved the predictions for vertebrates, since the training phase used information of these organisms, but H/ACA box snoRNAs identification was improved for the other ones. Conclusion We presented snoReport 2.0, to predict H/ACA box and C/D box snoRNAs, an efficient method to find true positives and avoid false positives in vertebrate organisms. H/ACA box snoRNA classifier showed an F-score of 93 % (an improvement of 10 % regarding the previous version), while C/D box snoRNA classifier, an F-Score of 94 % (improvement of 14 %). Besides, both classifiers exhibited performance measures above 90 %. These results show that snoReport 2.0 avoid false positives and false negatives, allowing to predict snoRNAs with high quality. In the validation phase, snoReport 2.0 predicted 67.43 % of vertebrate organisms for both classes. For Nematodes and Drosophilids, 69 % and 76.67 %, for H/ACA box snoRNAs were predicted, respectively, showing that snoReport 2.0 is good to identify snoRNAs in vertebrates and also H/ACA box snoRNAs in invertebrates organisms. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-1345-6) contains supplementary material, which is available to authorized users.

Tags
Data and Resources
To access the resources you must log in

This item has no data

Identity

Description: The Identity category includes attributes that support the identification of the resource.

Field Value
PID https://www.doi.org/10.1186/s12859-016-1345-6
PID pmid:28105919
PID pmc:PMC5249026
PID https://www.doi.org/10.1186/s12859-016-1345-6.
URL https://link.springer.com/article/10.1186/s12859-016-1345-6
URL http://repositorio.unb.br/bitstream/10482/32111/1/ARTIGO_SnoReport%202.0.pdf
URL https://dblp.uni-trier.de/db/journals/bmcbi/bmcbi17S.html#OliveiraCBSWH16
URL http://dx.doi.org/10.1186/s12859-016-1345-6
URL https://core.ac.uk/display/81068626
URL http://link.springer.com/content/pdf/10.1186/s12859-016-1345-6.pdf
URL http://repositorio.unb.br/handle/10482/32111
URL http://publica.fraunhofer.de/documents/N-428902.html
URL https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-016-1345-6
URL https://academic.microsoft.com/#/detail/2566190197
URL http://europepmc.org/articles/PMC5249026
URL https://bmcbioinformatics.biomedcentral.com/track/pdf/10.1186/s12859-016-1345-6
URL https://doi.org/10.1186/s12859-016-1345-6
URL https://dx.doi.org/10.1186/s12859-016-1345-6
Access Modality

Description: The Access Modality category includes attributes that report the modality of exploitation of the resource.

Field Value
Access Right Open Access
Attribution

Description: Authorships and contributors

Field Value
Author Fabrizio Costa, 0000-0002-4900-995X
Author Jana Schor, 0000-0003-1200-6234
Author Maria Emilia Walter, 0000-0001-6822-931X
Author Peter F. Stadler, 0000-0002-5016-5191
Publishing

Description: Attributes about the publishing venue (e.g. journal) and deposit location (e.g. repository)

Field Value
Collected From Europe PubMed Central; PubMed Central; ORCID; Datacite; UnpayWall; Fraunhofer-ePrints; Crossref; LAReferencia - Red Federada de Repositorios Institucionales de Publicaciones Científicas Latinoamericanas; Microsoft Academic Graph; CORE (RIOXX-UK Aggregator)
Hosted By Europe PubMed Central; Repositório Institucional da UnB - Universidade de Brasília (UnB); SpringerOpen; Fraunhofer-ePrints; BMC Bioinformatics
Journal BMC Bioinformatics, 17, S18
Publication Date 2016-12-15
Publisher Springer Nature
Additional Info
Field Value
Country Brazil; Germany
Language English
Resource Type Other literature type; Article; UNKNOWN
keyword Ácido ribonucléico
system:type publication
Management Info
Field Value
Source https://science-innovation-policy.openaire.eu/search/publication?articleId=dedup_wf_001::a5899c960e2c3c2e272f262622eb491a
Author jsonws_user
Last Updated 23 December 2020, 14:43 (CET)
Created 23 December 2020, 14:43 (CET)