GEEM Specifications
On the portal.html “Specification” tab, and on the stand-alone form.html “View Specification” menu a choice of formats is available for downloading a specification in.
The plain-text tabular format options may be the most accessible for software developers as they spell out the label and definition of each term, as well as synonyms, and numeric and textual field constraints.
- GEEM core nodes .tsv : The “core” nodes file contains all field terms in a form, except for the underlying choices of categorical input fields.
- GEEM core edges .tsv : Relations between a parent form element and its underlings are specified here.
- GEEM all nodes .tsv : Includes core nodes, but also has all underlying categorical choices.
- GEEM all edges .tsv : Includes core edges, but also has all relations that make up categorical hierarchies.
Note that a nodes table can list a particular ontology term more than once – for example on a form the same country list may occur for both sample location and specimen collection agency location. The “GEEM all edges .tsv” file will repeat both form fields’ choices in full. Also note that some categorical pick-list fields have a “lookup” user interface feature set. This is set in the GEEM raw specification in the case where a field’s selections are too large or unwieldy to include directly in a form (for example, listing all the municipalities in the world as possible sample sites to pick from). The lookup feature indicates that further choices can be fetched from online ontology lookup services like OLS; however implementers may want to include their own cacheable term lookup system for such functionality. Currently this feature is not listed in the core/all nodes tables.
Columns in a “core/all nodes .tsv” table:
- datatype: One of the XML schema datatypes allowable under OWL, or “model”, which indicates a grouping of form fields. The xmls:anyURI datatype indicates that given item is a categorical variable or an underlying choice of one.
- path: Forward slash / delimited list of form parts between top level form and given term or component. This is a full path rather than just a parent identifier to avoid ambiguity in the situation where a form field or component appears in more than one form section.
- id: Ontology_id of term or component
- uiLabel: User interface label of term
- uiDefinition: User interface definition for term
- help: User interface help about term.
- minValue: For a numeric variable, minimum value constraint
- maxValue: For a numeric variable, maximum value constraint
- minLength: For a string variable, minimum string length constraint
- maxLength: For a string variable, maximum string length constraint
- pattern: A keyword like “email” that describes a pre-defined string format, or a regular expression constraining a textual or numeric field entry.
- format: E.g. “dateFormat:ISO 8601” enables specification of predefined formats.
- preferred_unit: If a field has more than one unit choice, this indicates the preferred unit.
Columns in a “core/all edges .tsv” table:
- relation: Type of relation existing between parent and child, e.g. “component”, or “choice”
- path: parent term (or component) identifier, including path to that parent.
- child_id: child term (or component) identifier.
- minCardinality: The minimum number of the child type of field or component allowed to exist under parent.
- maxCardinality: The maximum number of the child type of field or component allowed to exist under parent. For example, enables one to specify up to 3 phone numbers.
The form Json (and Yaml) format is a hierarchic data structure whose elements represent form sections and fields, in order. Hierarchic categorical picklists are represented as ordered dictionaries within dictionaries, recursively.
- GEEM form.json
- GEEM form.yaml
The raw Json (and Yaml) format is in the relatively flat format that is used to drive the GEEM website itself. Each ontology term, including units, gets a top-level entry. The hierarchy of elements in a form are constructed from a given term id, by looking at that term’s components and choices (in case of categorical variable fields).
- GEEM raw.json
- GEEM raw.yaml
Form submission of data:
One last option is the “GEEM form submission” which shows one example of how form entries could be encoded for submission to a server (it isn’t a field specification per se.). Implementers may have other needs for form-to-server interaction that require different solutions. The prototype GEEM form renderer works off of “GEEM raw.json” specification directly. It currently does not implement the minCardinality and maxCardinality restrictions beyond marking whether a field or component is optional or required. In other words, a specification may allow for 3 phone numbers, but the form engine only shows functionality for entering one phone number currently. This will be remedied in the near future.
Future formats
We have plans to offer the following formats as further methods for gathering, encoding and managing data, and in the case of OntoFox, of reusing ontology components in another ontology:
- Web-based spreadsheet: This will provide a data validation and entry form according to given specification fields. Tabular data can be uploaded or cut&pasted into this spreadsheet.
- RedCap: The RedCap (http://redcap.vanderbilt.edu) system holds data specifications and provides data management forms driven by them in tablet and mobile phone friendly formats. GEEM aims to write a RedCap specification format so that ontology-driven standards can more easily be converted into RedCap forms.
- OntoFox: This (http://ontofox.hegroup.org/) system enables generation of an owl file with terms, axioms and annotations directly from given textual term specification file.