RT Journal Article SR Electronic T1 FALDO: A semantic standard for describing the location of nucleotide and protein feature annotation JF bioRxiv FD Cold Spring Harbor Laboratory SP 002121 DO 10.1101/002121 A1 Jerven Bolleman A1 Christopher J Mungall A1 Francesco Strozzi A1 Joachim Baran A1 Michel Dumontier A1 Raoul J. P. Bonnal A1 Robert Buels A1 Robert Hoehndorf A1 Takatomo Fujisawa A1 Toshi-aki Katayama A1 Peter J. A. Cock YR 2014 UL http://biorxiv.org/content/early/2014/02/01/002121.abstract AB Background Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples.Description We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations.Conclusions Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.BEDBrowser Extensible Data (file format)DDBJDNA Data Bank of JapanEMBLEuropean Molecular Biology LaboratoryFALDOFeature Annotation Location Description OntologyGFFGeneric Feature FormatGFF3Generic Feature Format version 3GTFGene Transfer Format, a variant of GFFGVFGenome Variation Format, an extension to GFF3INSDCInternational Nucleotide Sequence Database CollaborationOWLWeb Ontology Language (note acronym is OWL, not WOL)RDFResource Description FrameworkSPARQLSPARQL Protocol and RDF Query LanguageUniProtKBUniversal Protein KnowledgebaseVCFVariant Call Format