GenEpiO is designed for use in outbreak investigations, and in food and environmental pathogen surveillance. Since humans and animal infectious disease outbreaks can be similar in many ways, it encapsulates the commonalities of both kinds of host.
The GenEpiO project, thanks to funding provided by Genome Canada and Genome BC, has identified four subdomains to focus on in the next few years:
SurvO will define key indicators of reportable disease surveillance systems. Surveillance draws key data points from laboratory results and epidemiologic reports to provide trending statistics for disease monitoring. Deviation from normal trend could be a sign of an outbreak. Key terminology elements will be drawn from BC Centre for Disease Control Laboratory and Epidemiology Check Lists to form the surveillance ontology. The focus will be to introduce sufficient logical relationships to support automatic discovery of case definitions and to improve algorithms currently used to detect outbreaks (traditionally based on averaging case data over a fixed period of time).
The FoodOn food ontology was created to fulfill the need in food safety inspection and in foodborne pathogen outbreak investigations for a standardized food vocabulary that extends across international borders. We have also augmented FoodOn with other terms coming from existing OBO Foundry ontologies like NCBITaxon, CHEBI, and UBERON to provide more reasoning power and broad applicability. FoodOn has over 2,000 raw plant and animal food sources documented, as well as their taxonomic identifiers. In addition FoodOn includes (as a subset in foodon_siren.owl) a CFSAN Scientific Information Retrieval and Exchange Network (SIREN) food database contains over 9,000 food products which we foresee could form a basis for a globally unique set of food product identifiers. More information on how to use FoodOn in your project is available at the Wiki.
The IRIDA-GenEpiO team identified a few comprehensive food indexing systems (LanguaL and FoodEx2), as well as a number of loose categorizations of food by type, including lists from the Canadian Food Inspection Agency (CFIA), U.S. Department of Agriculture, and the United Nation’s Food and Agriculture Organization (FAO) AGROVOC vocabulary. Unfortunately the latter are not easily merged or mapped to each other as they were developed for different purposes and before the advent of the OWL format. For this reason, the IRIDA-GenEpiO team initiated work on the FoodOn “farm to fork” OWL ontology with collaborators who have similar need for a broad food ontology product. FoodOn consortium partners (who for example work with allergens, and aspects of agricultural practice) will maintain their respective subdomains of FoodOn.
The GenEpiO FoodOn team’s approach was to map significant parts of the LanguaL™ food product indexing thesaurus into FoodOn in order to describe food products by various facets. LanguaL™ has a long history of international collaboration, beginning in the late l970’s by the Center for FoodSafety and Applied Nutrition (CFSAN) of the United States Food and Drug Administration (FDA) as an ongoing co-operative effort of specialists in food technology, information science and nutrition. Since then, LanguaL™ has been developed in collaboration with the US National Cancer lnstitute (NCl), and, more recently, its European partners, notably in France, Denmark, Switzerland and Hungary. Since 1996, the European LanguaL™ Technical Committee has administered the thesaurus, and it has been applied on numerous food databases in Canada, the USA, and Europe.
ARO is a key component of the Comprehensive Antibiotic Resistance Database (CARD), a recently developed resource for AMR molecular surveillance that includes a curated collection of AMR gene and mutation sequences, bioinformatics models for their detection, and software in the form of the Resistance Gene Identifier (RGI) for their detection in raw genomic and metagenomic sequences. Its goal is to foster and build tools for a world where sequencing of pathogen genomes is commonplace in public health, environmental, and agricultural settings.
ARO helps build a common framework for the sharing of AMR data, providing over 3400 terms describing resistance genes, their products and associated phenotypes. These terms are used within the CARD to organize data and models. A common usage of the ARO is to collate and organize resistome predictions of the Resistome Gene Identifier, as recently performed for >400 isolates of Pseudomonas aeruginosa representative of the major sub-groupings of this pathogen. The CARD curated data or the RGI has been used extensively for analysis of pathogen genomes, analysis of metagenomic data, and construction of Resfams; the CARD currently averages ~2000 unique visitors and hundreds of RGI analyses per month. It is being increasingly used by academia, government, and industry for analysis of raw genome sequence and has led to important collaborations with the NML, Public Health Ontario, US Department of Agriculture, Chercheur Fibrose Kystique Canada (Quebec), and the National Center for Biotechnology Information (NIH, USA). ARO will be upgraded to the Basic Formal Ontology (BFO) standard.
MobiO will aid in tracking of horizontal gene transfer among pathogens. Virulence factors and certain classes of AMR genes are disproportionately associated with mobile elements, so we will devise an ontology to represent these elements of growing medical interest. Terms and definitions will be drawn from the text book Mobile DNA III and our previous reviews of this topic. Existing ontology resources (e.g. https://bioportal.bioontology.org/ontologies/MEGO which appears to be inactive) will also be reviewed and re-used as appropriate. Conceptual overlaps with ARO will additionally be resolved.