To increase the global interoperability of genomic epidemiology contextual information (genomics metadata, lab, clinical and epidemiology data) to support data sharing and integration for infectious disease surveillance outbreak investigations, as well as other public health activities. To achieve consensus and wide adoption, and to develop GenEpiO according to best practices, the consortium embraces collaborative development and open use/open source technology.
The aim of the International GenEpiO Consortium is to engage global expertise in order to further develop the Genomic Epidemiology Application Ontology (GenEpiO) in four key areas, Food, Antimicrobial Resistance, Disease Surveillance and Mobile Elements. Using online tools, domain experts in different disciplines can contribute and curate terms, participate in technical ontology development, and provide feedback regarding content and architecture. Consensus between research, regulatory, industry and public health user groups will facilitate wide uptake and implementation of GenEpiO, enabling faster and more efficient data integration and data sharing across borders. The consortium anticipates that the standardization GenEpiO will offer in terms of describing the generation of genomic sequences and phylogenomic trees, specimen types and sample collection methods, pathogen exposures and laboratory reporting, will contribute to reproducibility of results necessary for validation and clinical accreditation of whole genome sequencing pipelines.
Specifically, the goals of GenEpiO are:
- To further develop GenEpiO in the areas of food, antimicrobial resistance, disease surveillance and mobile element controlled vocabulary
- To build GenEpiO semantic architecture according to OBO Foundry principles to enable more complex querying across data sets
- To engage international domain experts to provide knowledge, feedback and technical assistance in curation, design and implementation of GenEpiO
- To encourage uptake of GenEpiO across different platforms worldwide to better enable real-time communication of results and reporting
The subdomains we have selected for the GenEpiO ontology range from the initial process of sample collection, whether it be in a human clinical or an animal, plant or substance-based environmental context, through to the lab processing, sequencing, phylogenetic analysis and outbreak detection. This diagram sketches GenEpiO’s current and future subdomains.
To realize the full potential of applying microbial genomics to infectious disease surveillance and outbreak investigations (a.k.a. genomic epidemiology or GenEpi), integration of contextual data about a host and pathogen (i.e. contextual information) with genome sequence data is critical. However, such contextual data, ranging from clinical sample description, lab test results, pathogen genotype and phenotype information, and epidemiology exposure data are often institution-specific and without interoperable standards. Ontologies, or structured controlled vocabularies, combined with semantic web technologies are increasingly used as a foundation of biomedical informatics – providing a flexible and powerful solution to enable system-interoperability and computer-friendly data representation between disparate data domains and data providers. For an ontology to be widely accepted, it needs to have practical applications and to be accessible and supported by user and developer communities. GenEpiO currently contains fields and terms to describe the various genomics, laboratory, clinical and epidemiological processes used in genomic epidemiology workflows. This standardized vocabulary has be imported from over 25 OBO Foundry ontologies representing many different domains of knowledge, and also contains novel semantics developed by the GenEpiO team (insert figure of process and domain ontologies). An introductory slideshow highlighting the importance, utility and advantages of GenEpiO with regards to data standardization, data integration and data sharing is available here.
To engage public health and research communities and to promote richer genomic metadata for GenEpi analysis, the Lead GenEpiO Development Team aims to provide a web portal and associated communication and code repository environments for collaborative development of GenEpiO based on the widely used OBO-Foundry standard (http://www.obofoundry.org/). The basis for GenEpiO has been developed in consultation with public health practitioners and researchers as part of the Canada-wide IRIDA.ca project (http://www.irida.ca/data-integration/), and a growing international group of collaborators. An OWL file with the draft version of the GenEpiO ontology resource is available at https://github.com/genepio/genepio/.