Just Turtle and RDF vs OWL examples: the CPEV and FIBO

This is a more concrete follow-up to the previous somewhat theoretical post on ontologies not being just RDF.

I tried hard to think of how other people might approach the syntax issues and assumptions. When I asked someone who’s involved in developing a vocabulary represented in a Turtle document about it, it resulted in a befuddled reply. Maybe there lies a problem. Regardless, I’ve seen a number of people editing the text files directly rather than through a user interface that helps the modeller avoiding making syntax errors. Either way, we’ll look at two documents in this post: SEMIC’s Core Public Event Vocabulary aimed at EU public administration interoperability, because it was the most recent one of all the Core Vocabularies with a published update, and the Financial Industry Business Ontology, because it has so many files, and both are actively maintained and each one has a user community that, to the best of my knowledge, does not overlap.

The Core Public Event Vocabulary CPEV

Having noted the manual editing of the serialisation, I imagine thy need a plain RDF or Turtle syntax checker at least, so let’s try that. The first hit in duckduckgo is the http://ttl.summerofcode.be/ and I copied the Core Public Event Vocabulary (CPEV) into it. And…

Yay! It is valid Turtle syntax. The verdict is satisfying, so why bother looking beyond it? Because we want that type-level vocabulary it claims to be, conformant to OWL so that anyone can use it for their ontology-driven, Semantic Web and W3C standards compliant application, be it to foster interoperability or to serve a stand-alone knowledge-driven software application (or use it in graphRAG if you insist).

Let me try another tool, Protégé, as the go-to ontology editor for most ontology developers that comes with the reliable OWL API, under the assumption that I don’t have any other options. When I try to load and parse the CPEV by URL from the GitHub repository where it is published, i.e., https://github.com/SEMICeu/Core-Public-Event-Vocabulary/blob/master/releases/1.1.0/voc/core-public-event.ttl, it returns a number of syntax errors. There’s that inconvenience of uploading ontologies on GitHub: you have to click the ‘raw’ button, and then load by URL from that URL rather, being https://raw.githubusercontent.com/SEMICeu/Core-Public-Event-Vocabulary/refs/heads/master/releases/1.1.0/voc/core-public-event.ttl. It loads. Downloading the core-public-event.ttl locally and opening it also works, in the sense of opening in Protégé without any imports. No imports were declared, either, although multiple prefixes have been used.

Inspecting the content, curiously, certain fields are empty in the header section (see image on the right). The file itself certainly does have content for those items, using FOAF. There’s also a foaf:Person class with a number of odd instances of the _:genid2147483656 variety, which it is not supposed to have, nor is that present in the ttl file that lists the actual names of the editors, so something does not add up. It might be tempting to blame the software ‘downloaded from the Internet’ that my operating system warns me for and prioritise the text file one can inspect directly in a text editor, but bear with me for a little more.

Since there’s something amiss with the persons, and so something to do with FOAF usage, one could try to import FOAF explicitly to see what happens, or delete the empty metadata fields, or convert it to the required exchange syntax RDF/XML or another syntax to attempt to gain insight what the issues are. Protégé’s error messages were a bit cryptic when I tried the latter with functional style syntax. I’ll save you reading through a few dead ends.

Was all this a productive use of my time? Most definitely not. A few of the erstwhile students of my ontology engineering course didn’t think so, either, and developed the OWL classifier that finds out straight away the DL fragment you ontology is in, in which OWL Species it is, and it presents a lists of violations of the other OWL Species profiles, if any, using two OWL API versions, for OWL and OWL 2. Here’s a screenshot from the downloaded CPEV ttl and one where I had imported FOAF:

The core-public-event.ttl contains 80 OWL 2 DL violations that causes it to be in OWL 2 Full only and the one where I had imported FOAF still contained 31 violations. Importing FOAF resolved 68 “undeclared x” issues, but added 19 more punning issues, which, at some point, looked promising to possibly pursue. I started to dig in earnest.

There are undeclared classes (e.g., foaf:Person) reported, undeclared annotation properties, punning problems (e.g., accessibility and event number), and so on and so forth. However, more of a smoking gun were the two detections of “Use of reserved vocabulary for class IRI: rdf:langString”, which remain in both lists of violations: the rdf:langString is not a permitted datatype for a data property range in OWL, and it shouldn’t have been reporting it as a class. The rdf:langString was new in RDF 1.1 of 2014 and is intended for “A literal is a language-tagged string if the third element is present.”, yet OWL 2 was last updated in 2012. The OWL API added support for it in February 2015 nonetheless. Removing the rdf:langString from the range declarations of its two uses, with declared data properties accessibility and event number, resolves most issues except for one. It resolves many because rdf:langString is treated as a class yet ought to be a data type, ‘confusing’ the rest of CPEV’s content.

Is a language tag for event number so relevant that we have to leave behind DL-based ontologies? No, that data type should be an xsd:integer or else rdfs:plainLiteral and if someone wants to add ‘rd’ or ‘de’ ‘ieme’ after a 3 in a user interface, then that has to be sorted out by the surface realiser where it ought to be addressed anyway. Must the accessibility data property have a language tag? I appreciate there will be one somewhere in the information system, but a language-tagged blurb about how the venue of the event is accessible for people with mobility restrictions does not warrant a transgression into undecidability of the vocabulary. Nor is it ontologically defensible. Any information system can handle that requirement trivially without destroying ontology-based data access prospects, without forcing undecidable FOL reasoners upon us. Add another field ‘in language’ if you must (which also gives the freedom for more languages and dialects than pre-set ones, even more so with MoLA) or maybe ontolex-lemon is of use here.

That out of the way, the one remaining category of issue is the anonymous individuals in OWL 2 EL and OWL 2 QL, which, considering that few DL-based OWL features are used, does make sense to aim for in the interest of scalable applications. The blank nodes that generate to new individuals in the ontology file are due to the unnecessary nesting of the list of editors that forced the introduction of blank nodes, rather than adding them one by one as an individual each. That is, instead of a pattern like

  <http://www.w3.org/2001/02pd/rec54#editor> [
    a foaf:Person;
    foaf:firstName "x";
    foaf:lastName "y"
  ], [
    a foaf:Person;
    foaf:firstName "z";
    foaf:lastName "w"
  ];

in the ontology metadata, adding a separate entry for each, like

<http://www.w3.org/2001/02pd/rec54#editor> “x y”
<http://www.w3.org/2001/02pd/rec54#editor> “z w”

addresses that problem.

Removing that Turtle list as it was, the modified CPEV file is also in all OWL 2 Profiles. Is an RDF list with blank nodes worth it to forfeit CPEV usage in ontology-based data access systems, when it can be solved simply by adding each editor individually? I don’t think so. Or: what are the arguments why the listing causing the blank nodes is preferable over scalable use of CPEV?

The Financial Industry Business Ontology FIBO

FIBO has many files in its GitHub repository, and I randomly picked one, being Legal Capacity https://github.com/edmcouncil/fibo/blob/master/FND/Law/LegalCapacity.rdf. The W3C RDF Validator is ok with it (see image on the right).

Testing it in the OWL Classifier, it also turned out to be in OWL 2 Full:

and a copy of the full text of the list of violations is pasted here to see the entire entries:

1 - Use of undeclared annotation property: owl:minQualifiedCardinality in annotation [Annotation(owl:minQualifiedCardinality "0"^^xsd:nonNegativeInteger) in AnnotationAssertion(owl:minQualifiedCardinality _:genid650 "0"^^xsd:nonNegativeInteger)]

2 - Use of undeclared annotation property: owl:minQualifiedCardinality in annotation [Annotation(owl:minQualifiedCardinality "0"^^xsd:nonNegativeInteger) in AnnotationAssertion(owl:minQualifiedCardinality _:genid649 "0"^^xsd:nonNegativeInteger)]

3 - Use of unknown datatype: rdf:langString [DatatypeDefinition(<https://www.omg.org/spec/Commons/TextDatatype/Text> DataUnionOf(rdf:langString xsd:string )) in OntologyID(OntologyIRI(<https://www.omg.org/spec/Commons/TextDatatype/>) VersionIRI(<https://www.omg.org/spec/Commons/20221101/TextDatatype/>))]

4 - Use of reserved vocabulary for annotation property IRI: owl:minQualifiedCardinality [AnnotationAssertion(owl:minQualifiedCardinality _:genid649 "0"^^xsd:nonNegativeInteger) in OntologyID(OntologyIRI(<https://spec.edmcouncil.org/fibo/ontology/FND/Utilities/Analytics/>) VersionIRI(<https://spec.edmcouncil.org/fibo/ontology/master/latest/FND/Utilities/Analytics/>))]

5 - Use of reserved vocabulary for annotation property IRI: owl:minQualifiedCardinality [AnnotationAssertion(owl:minQualifiedCardinality _:genid650 "0"^^xsd:nonNegativeInteger) in OntologyID(OntologyIRI(<https://spec.edmcouncil.org/fibo/ontology/FND/Utilities/Analytics/>) VersionIRI(<https://spec.edmcouncil.org/fibo/ontology/master/latest/FND/Utilities/Analytics/>))]

Setting aside the langString, the “AnnotationAssertion(owl:minQualifiedCardinality” is clearly the problem: cardinality constraints are not there for annotation, but to be used in class expressions, and they are reserved for it at that. This is both a problem for Legal Capacity and the Utilities/Analytics it imports. Someone had added minQualifiedCardinality as an annotation property, likely by accident:

It is not allowed to be so if it is to be a DL-based OWL ontology, because it’s reserved vocabulary. Note that “An OWL 2 Full ontology document is any RDF/XML document”, and since it validates, one could argue it’s an OWL ontology. Yet, the DL constructs used are merely ALCHIQ(D), or: most certainly using less than the OWL 2 DL features one could use. Would one want to forfeit decidable automated reasoning for an unused annotation property? I think not.

Where to go from here?

These sort of issues are hard to find manually, and ever harder once the size and complexity of the ontology and the modules it imports increase. Eight OWL Species to choose from in a number of different concrete syntaxes doesn’t make the debugging task any easier either. That’s why my students chose to develop the tool. They did so a while ago, however, and the OWL classifier (GitHub repo) I used was developed in 2016 and only works with older JDKs due to backwards incompatibility of Java, it having been a mini-project topic of the course, the CS honours students – Brian Mc George, Aashiq Parker, and Muhummad Patel – graduated and moved on, and new students want to do new things. You’ll have to set up an older version of JDK to avail of the OWL Classifier to catch syntax violation issues with basic explanations. It doesn’t solve the syntax problem yet, but at least it pinpoints, or at least directs, to where the violations are that make it RDF/Turtle but not the lightweight OWL many an RDF-oriented modeller is after.

Finally, I don’t want to merely complain; I want to help. Writing this and the previous post in an uncontrollable GitHub issue for one ttl/rdf file is a bit much, and, going by the ttl and rdf files that exist on disparate repos, it serves to be known for more than one such file.

It’s good to see how much of the Semantic Web technologies actually made it into industry and public administration, especially considering all the boasting (and bullying?) by the LLM groupies. I’d like to see it taking a step up towards further effective interoperability in the EU and beyond.

No, an ontology isn’t ‘just RDF’

Over the past few years that I’ve been peeking and dabbling outside computer science and the ivory tower of academia more than before, I noticed a disturbing trend, or perhaps even entrenched practice, of talk about “RDF ontologies” and of “ontologies really being no more than just RDF graphs”. But just because ontologies in OWL are expected to be serialised in RDF/XML as the required exchange syntax according to the standard – and optionally in another specified format, such as Turtle (an acronym of Terse RDF Triple Language), OWL/XML, functional style syntax, or Manchester syntax – and has a mapping into RDF, it doesn’t make them ‘RDF ontologies’. Why isn’t an ontology ‘just RDF’?

A very short non-technical answer is that while ontologies (formalised in OWL) can be seen as an RDF graph when that particular serialisation language is chosen, not all RDF graphs are ontologies, and since there are valid serialisations of OWL ontologies that are not graphs (e.g., OWL/XML), not all OWL ontologies are RDF graphs when you encounter the document. A longer explanation follows, where I’ve adapted sidebar 7 of my textbook to make its contents more suitable for a blogpost, and for that reason it also has a little from the ‘encoding peculiarities’ section and sidebar 12.

Abstract versus concrete syntax

(source: my textbook 2nd edition, p103)

As preliminary, recall that the OWL 2 Web Ontology Language is a W3C recommendation since 2009 with a 2nd edition in 2012, and the latest RDF 1.1 for publishing and linking data is a recommendation since 2014 (RDF 1.2 is on the way). And, for what it’s worth it (one certainly can squabble about certain parts – visuals have limitations), here’s an adjusted Semantic Web layer cake, with more standards, less crazy on the colours, and relevant DIKW pyramid concepts on the right.

Let’s consider “Figure 1 The structure of OWL 2” from the OWL 2 overview, reproduced below, and the lime-green oval in the centre with the “mapping”-labelled arrows between OWL and RDF, and, why not, also that orange rectangle at the top-centre with the “RDF” mention and the “Turtle” on the right as well. Perhaps that’s what might cause a reader to simplify it all to equate an ontology with an RDF graph, and save their ontology with a .rdf or, as problematic, .ttl extension instead of saving the ontology in RDF/XML format with an .owl extension to indicate it’s meant to be an ontology in OWL.

“The structure of OWL”, Figure 1 of the OWL 2 document overview.

An OWL 2 ontology is represented as an instance of the OWL 2 structural specification, which is independent of concrete OWL 2 exchange syntaxes. Put differently, an OWL 2 ontology has one structural specification and will be written and stored in one or more concrete exchange syntaxes. RDF/XML is one such exchange syntax, and the mandatory one at that; functional style syntax is an optional exchange syntax, and so are Turtle, Manchester syntax and OWL/XML. Here are two examples of those concrete syntaxes and their respective rendering in a GUI: berry being a fruiting body and carnivorous plants eating animals from a tutorial on improving an ontology, as lazy screenshots of Protégé that fit in my laptop window.

For the data and information modeller among you, you may draw a parallel with UML: there’s the standard that specifies valid UML diagrams, and for a modelling tool one can choose how to turn the UML class diagram into flat text to manipulate and store it, be it OMG’s XMI, some other XML, JSON, your home-grown pet language, or even store them in RDF by treating the diagram as a graph. Does that make the UML class diagram alternatingly a tree data structure, a graph, and a collection of name/value pairs or, schizophrenically, all of them at the same time? No, a UML class diagram remains exactly that, regardless of the implementation choice for the serialisation language.

Back to OWL. The particular structure of an OWL ontology can be mapped into an RDF graph for a concrete computer-processable serialisation of the ontology. Any Description Logic-based OWL ontology still has a direct semantics that is model-theoretic. That mapping into what is syntactically an RDF graph does not change the semantics of the ontology if it’s DL-based (i.e., in either of OWL DL, OWL Lite, OWL 2 DL, OWL 2 EL, OWL 2 QL, or OWL 2 RL), or: the ontology does not swap into a graph-based semantics by serialising it in RDF or its Turtle dialect and we still can send it to the DL-based automated reasoner.

Another way of looking at it is that for concretely writing down the ontology for computational use, we abuse/avail of some syntax that was already specified somewhere for another purpose – representing data and information on the Web – that’s reused here for a different purpose – serialising an ontology where knowledge is represented. It’s a bit like abusing UML class diagram notation to visualise key aspects of an ontology for communicative purpose because it’s around already, people are more familiar with UML notation, and it saves you inventing and explaining a new visual notation.

There are two key reasons why a distinction is made between an abstract structural specification and concrete syntaxes. First, the abstract structure serves as a pivot that then can be linked to multiple concrete syntaxes, compared to generating many-to-many mappings between all exchange syntaxes. Second, additional practical conveniences can be added to concrete syntaxes that do not affect the logical theory (the ontology). For instance, a concrete syntax may have an abbreviatedIRI feature to simplify processing long IRI strings and it may have extras for ontology annotations.

If you look at the fine print of the mapping specification from OWL into RDF, that is, not the convenient table but some parts of the surrounding text, you’ll notice the ‘snag’ that it isn’t simply 1:1. Ontology O and the transformation of it into RDF syntax, T(O), works anyhow, yes, but it’s the “The mapping presented in Section 3 can be used to transform an RDF graph G satisfying certain restrictions into an OWL 2 DL ontology OG” that makes a difference (bold face added). Whatever is in the graph needs to adhere to what’s described in that Section 3; if it doesn’t, it’s still a graph, but just isn’t an OWL ontology. Consequently, RDF tools great for processing lots of instances aren’t necessarily adequate for OWL ontologies – if the tool’s feature set doesn’t boast adherence to those “certain restrictions”, then they aren’t adequate as tool for ontologies for sure.

RDF Schema?

Perhaps the people who talk about ‘RDF ontologies’ mean lightweight ontologies or vocabularies in RDFS, short for RDF Schema. RDFS is based on the RDF Semantics specification and is intended for type-level information and can help guide what to add to a graph. You can declare classes, properties, class hierarchies, property hierarchies, domain and range restrictions, and a few other things like labels, see-also, and bags, but not more substantive knowledge about the subject.

It won’t let you declare characteristics of properties (e.g., inverse, transitive), nor local range restrictions (e.g., that for a class Person specifically, the property hasName has as range xsd:string), nor complex concept descriptions (e.g., that class Bicycle is defined by the union of Human-powered bicycle and Electrical bicycle), nor cardinality restrictions (e.g., each Electrical Bicycle has exactly 1 motor), nor disjointness axioms (e.g., nothing can be both Apple and Orange), not to mention that one can mess up/around, like using vocabulary of the language (e.g., stating that rdfs:Class rdfs:subClassOf ex:a).

If you were thinking in the direction of a schema for RDF, and so RDFS, yet an ontology regardless, then you probably had intended to say an ontology in OWL 2 Full. Reasoning over OWL 2 Full is undecidable, so it’s not like that by forfeiting all the nice modelling features you’d be rewarded with good performance. Or: this may not be what you really want to have.

Ontologies in data stores

Perhaps the people who talk about ‘RDF ontologies’ meant something else. There are, for the lack of a better term, ‘encoding peculiarities’. I could store my ontology about, say, electrical bicycles in a relational database as well, if I so fancy. For the class hierarchy, I can create a 2-column table called Taxonomy, and store it there:

and so on for other tables, like a hasPart table with four columns: one for the whole, one for the part, and two for the basic constraints (universally or existentially quantified, number restrictions). Mathematically, that trick has turned my classes and properties into values. Not that most people would care, because we can look at it and think of it as if they were classes. Computationally, some tasks will go faster. Regardless, we can take R2RML and convert the relational database to RDF, and voila, we have the ontology as an RDF graph at the level of individuals. It’s mathematically and technologically gymnastics, but anyone who understands the stretching wouldn’t talk of an RDF ontology, but keep the performance optimisation hack under wraps and of no concern to the modeller.

When I look at the .ttl files of the SEMIC Core Vocabularies, for instance, such as the most recent release that happens to be v1.1 of the core public event vocabulary, it looks like that the intent is the first case, i.e., where an OWL ontology is serialised in Turtle, as is the case for QUDT, and others. If OWL 2 Full or any of the DL-based OWL 2 languages was intended, they should have had an .owl extension to indicate the ‘specialness’ of that Turtle file. It is not much different for the Financial Industry Business Ontology (FIBO) where, although the syntax isn’t even in Turtle or simple RDF, the file extension is still .rdf rather than .owl. I don’t mean to pick on these, but just happen to know of them and they originate from different communities.

In closing

As per Conformance (normative) of OWL 2, there are OWL 2 Full, OWL 2 DL, OWL 2 EL, OWL 2 QL, and OWL 2 RL ontology documents. Not ‘RDF ontology’ documents. They can be serialised in, at least, RDF/XML, Functional Style Syntax, OWL/XML, Turtle, and Manchester Syntax. Let’s not conflate the ontology with merely one of its exchange syntax serialisation options. More precise terminology may help communicating better, like tasting one’s vocabulary agreement tea.

Of course, other modelling languages exists that can be used for representing an ontology on paper or for computational use that are also not RDF graphs, such as Common Logic. Also, a tool such as Protégé can easily convert between the exchange syntax formats specified in the OWL standard and a few others (export to LaTeX, render it visually in OntoGraf, and whatnot). If you fancy the ontology to be in OWL/XML so you can use Owlready2 in a Python programming environment, go for it – just make sure it’s an OWL 2 ontology. It’s conformance to the OWL standard that counts for all those ontologies in the Semantic Web we weave in order to not end up knitting knots that would become too daunting to disentangle.

p.s.: I‘ll do have more concrete examples in the next post that I’ll finish up in a day or two, zooming in on CPEV and FIBO.