Our choice of the OWL ontology format and the OBOFoundry framework for GenEpiO stems from the desire to see that our various users and stakeholders – from front-end lab and epidemiology staff, to back-end project managers, system designers, software developers, IT implementers, and ontology maintainers – are all satisfied by GenEpiO’s relative simplicity and ease of implementation. This piece explains our general motivation for choosing the Web Ontology Language (OWL) format for GenEpiO, and the benefits of operating within the OBOFoundry family of ontologies.

A common vocabulary for end users

Often an institutional project to interface an internal data system with another system is not given the mandate to benefit from any data standards besides the ones existing at either end of the cable. Considering an interface project as a direct mapping of fields, terms or reports may seem simpler, faster and safer than a solution involving the murky world of ontologies. However, this one-off, peer-to-peer project approach has a number of perhaps hidden deficits. It remains a borough with its own slang, rather than speaking a global language that implementers and users may increasingly be familiar with; and the same problem occurs all over again when data exchange is desired with a third system. The difference between basic syntactic compatibility and semantically interoperability is explored in this ISO13606 standard document.

The mandate to treat each new peer-to-peer system implementation as one that benefits from an emerging and dominant global standard data dictionary is basically an effort to get rid of the slang that individual systems exhibit. In this new paradigm an integration project will adopt 3rd party vocabulary standards between the two systems as much as possible – the same 3rd party standards being adopted by other peer-to-peer projects. In this way an institution’s encyclopedia of often externally-provided data dictionaries finally sees much more reuse of its volumes and implementation code, smoothing of data transfer, and reduced training costs and expertise overhead.

We view the OWL ontology format and BFO interoperability as the most attractive option that is on the way to dominating the digital highway. This yields an experience where end users increasingly find they have seen vocabulary and lookup functionality for various interfaces elsewhere in the institution or in 3rd party systems. Internationally accepted taxonomy pick lists (e.g. http://www.itis.gov), ISO codes for countries, etc. have been used by database projects, but many hierarchies of terms are now available in exactly the same OWL format, and to a level of detail that satisfies professional users. The Human Phenotype Ontology, HP (https://bioportal.bioontology.org/ontologies/HP) is a great example of this, as just about every term from headache to dysarthria is in there, making it attractive for use in electronic health records.

A coherent resource for project managers and system designers

An organizational issue occurs with many centrally-administered large vocabularies that operate in isolation to each other: Although an agency’s tight control of its vocabulary is required to maintain a high standard, this makes it a lengthly – and even futile – process for non-agency projects to request extensions or improvements to the given vocabulary for their own use. This is especially true of multidisciplinary projects, and ones dealing with new vocabulary, both of which are focal points of GenEpiO’s genomic epidemiology mandate. We’ve found the OWL ontology platform a great solution to this dilemma. Due to the OWL format and OBO community, terms and relations can be cherry-picked from many domain specific ontologies into one cohesive ontology. Changes in the domain ontologies can either be imported or ignored if not pertinent. Individual domain curators can proceed with their changes without necessarily impacting any of the downstream ontologies that import from them. The uptake of an ontology and the resources of its curation team can vary, leaving one to be more selective about which ontologies to rely upon.

Generally the OBO family of ontologies are intended to be orthogonal to each other – I.e. a term concept should only be defined once in one of the member ontologies – but as with the anatomy related BRENDA, UBERON, and FMA ontologies, or with the geographic GAZETTEER and GEO, there is currently domain overlap and one can pick favourites. We should acknowledge that there are still frustrations with criss-crossed term definitions or mis-used or duplicately named relations that curators need to consolidate in OBOFoundry – these are the growing pains of this collective “crowd-sourced” system. A key benefit to OBO is that since each curation team has a narrower mandate (as opposed to broad–domain projects like Standardized Nomenclature for Medicine (SNOMED)) they are likely receptive to extending terms in their domain or correcting granularity issues that often occur as these dictionaries evolve. Consequently an ontologist often has the choice of inserting a new term permanently within their own ontology, or adding it temporarily and then getting it added to an “upstream” ontology (by a request to the curators). Finally, there is an annotation system to enable one to reference equivalent terms. No other system of organizing terminology on the web has this functionality to our knowledge. We find it the most elegant method for standardizing vocabulary across the globe, and for encouraging software implementations to work towards a unified future.

Ontology tools for software developers and IT implementers

Currently there is an open-source family of tools that enables one to identify needed terms (OntobeeBioportalAberOWL, OLS, and to assemble an umbrella ontology from import files (OntoFox), and to curate it with additional elements and logical relations (Protege).

Commercial use of ontology is generally still trailing academic use probably because much work in the biological and medical sciences involves a flood of new discoveries whose vocabulary has to be managed globally in order to avoid rampant confusion – so academics have had the mandate to confront the problem as a community. There are commercial tools for managing ontology and other semantic content, and there are commercial success stories such as https://www.palantir.com and http://www.cognitum.eu.

Whether one’s project is a stand-alone reporting system or is about data exchange, as one peruses ontologies for terms, the depth and complexity of the domains they cover is evident, suggesting that database work and ongoing maintenance can be reduced by reusing desired parts of their vocabulary.

There is also a whole other fine-grained logic level of detail one can implement in order to finesse more intelligence using an OWL “reasoner”. One limitation here is that the queries for extracting this information are as specific and complex as the logical structure one has to enforce in one’s knowledge base to support them, so it is just as complex as SQL query language to extract information, albeit more powerful. Other limitations (for humans) exist in the “open world” default logic, which requires more statements to make the reasoner understand that for example the only possible choices for a pick-list are the ones stated (and there is no meta-logic available to just say “This list is complete”.) There is no natural language interface we know of via the OWL platform yet to this reasoning technology (although some companies like cognitum are working towards this), and so it is deeper work to get answers to useful but “canned” questions this way. Work in this area aims to satisfy the quest to utilize “Big Data” since it can directly access different semantic web databases on the net.

It may be extra work for a given project to go and ferret out the terms it needs for pick-lists or the variables (measurable terms) related to its bio-medical goals. But the payback shows up with lower maintenance costs when it comes to maintaining that vocabulary, with its imported content automatically evolving and expanding in capability. More benefits accrue when an agency wants to integrate systems that already reference the same terminology lists (or well-defined equivalents). That is the application development future we want GenEpiO to satisfy.