As Linked Data engineers, we often make the case that Linked Data lets us align data automatically. Yet I’ve rarely seen a demo where this is really the case. I vividly remember a conversation I had with Kasia Bourée, one of the leads behind the Transmodel Ontology, at the European Commission about 10 years ago: she asked me why on earth she would have to adopt RDF, because all she had seen was that, in the Linked Data world, people also just make spreadsheets and manually align data. She was right: OWL was still too theoretical, and we were indeed creating alignments between ontologies manually, instead of relying on the links between those ontologies. That was before SHACL and before agentic AI. In this blog post I show, step by step and with demos you can edit in the page itself, how OWL 2 RL reasoning, SKOS reasoning and SHACL validation can work together in the browser using the RDFJS Inference Engine and Eyeling to fully automate data integration.

RDF gives us a way to write statements. The question is what happens when we start writing reusable knowledge about those statements. The key idea is old and still underused: a small generic rule can make many domain-specific facts useful. This post is also an answer to a comment by Enno Meijers, who saw the promise of automatically generating actionable mappings from formally defined alignments, but was still missing a practical way to move forward.

N3 is an RDF superset that lets us write rules. A rule has an antecedent and a consequent: if the pattern on the left is known, the pattern on the right can be derived. The syntax is intentionally close to Turtle, but adds formulas between curly braces and the implication symbol =>. Let’s start with the simplest possible version: if something is a :HydrogenBus, infer that it is a :ZeroEmissionVehicle.

Demo 1: an executable N3 rule

The pane contains both a tiny hard-coded alignment rule and the RDF data it runs on. Try changing :HydrogenBus or :ZeroEmissionVehicle, and run the rule with Eyeling.

Ready.
Click “Run with Eyeling”.

The first demo simply hard-codes one alignment rule. It says: whenever a resource is a :HydrogenBus, derive that the same resource is a :ZeroEmissionVehicle. That is fine for one class, but it becomes unmanageable as soon as we need hundreds of class and property alignments. We would be copying the same rule pattern over and over again, only changing the vocabulary terms.

This is why we need an ontology language such as OWL, and in this first step more specifically the RDFS/OWL idea of subclassing. Instead of writing a separate N3 rule for :HydrogenBus, :CargoBike, :ElectricFerry and every other local term, we write generic N3 rules once against the ontology language. Then each project only has to annotate its own vocabulary with statements such as rdfs:subClassOf. The RDFJS Inference Engine loads those generic OWL 2 RL rules in the next demo.

In other words, we move from writing rules about our domain classes to writing rules about the ontology language itself. A tiny fragment of such a rule profile could look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

# Class alignment: instances of a subclass are also instances of its superclass.
{
  ?class rdfs:subClassOf ?superClass .
  ?thing a ?class .
}
=>
{
  ?thing a ?superClass .
} .

# Property alignment: values of a subproperty are also values of its superproperty.
{
  ?property rdfs:subPropertyOf ?superProperty .
  ?subject ?property ?object .
}
=>
{
  ?subject ?superProperty ?object .
} .

The important difference is that these rules never mention :HydrogenBus or :ZeroEmissionVehicle. They only mention vocabulary primitives such as rdfs:subClassOf and rdfs:subPropertyOf. Once these generic rules exist, a concrete vocabulary only needs RDF statements that say how its classes and properties relate to other classes and properties. For OWL 2 RL, which is a pragmatic subset for inferencing, I’ve created a full spec compliant N3 file over here.

Demo 2: OWL 2 RL starts with generic rules

The demo loads the OWL 2 RL N3 rule profile and uses the vocabulary on the left as background knowledge. Try changing :HydrogenBus into :CargoBike in both panes, or add another subclass step, and run the inference again.

Ready.
Click “Run inference”.

Demo 2 does the same conceptual alignment as Demo 1, but it moves the project-specific knowledge into the ontology. The rule profile is no longer about hydrogen buses. It is about the meaning of rdfs:subClassOf: whenever data says that something is an instance of a class, and the ontology says that this class is a subclass of another class, the reasoner may derive the broader class as well. This is the first step towards alignments that are maintainable: the rules stay generic, while the vocabulary statements carry the project-specific meaning.

From rules to reusable specification artefacts

The abstraction above is the reason I find rule-based reasoning so interesting for knowledge engineering. We can write the semantics of a vocabulary once, and apply it again and again to concrete data. The rule profiles in rdfjs-inference-engine currently include an N3 implementation of OWL 2 RL, SKOS Core, SHACL Core and a draft SHACL 1.2 Core extension layer.

That means knowledge engineering is not limited to hand-writing transformations or queries for every case. We can keep working with the artefacts we already know:

  • Ontologies express class hierarchies, property hierarchies, domains, ranges, equivalences, inverses and other reusable vocabulary-level knowledge.
  • Taxonomies, often in SKOS, organize concepts and semantic relations.
  • Shapes, often in SHACL, describe what a system expects, validates or wants as output.

These artefacts do not have to live in isolation. They can be combined in one reasoning step. This is where the story becomes more powerful than a single subclass example.

Combining OWL and SKOS

Suppose I have a local catalog vocabulary. In that vocabulary, a dataset topic is called :DatasetConcept, and the relation between related topics is called :relatedTopic. I can align this local vocabulary to SKOS with a small ontology: :DatasetConcept is a subclass of skos:Concept, and :relatedTopic is equivalent to skos:related.

Now OWL 2 RL and SKOS start helping each other. OWL/RDFS reasoning lifts the data into SKOS. SKOS reasoning then adds the symmetric relation and the broader SKOS semantic relation. This is a tiny example, but it is exactly the type of pattern that matters when different catalogues, data spaces or APIs use slightly different vocabularies.

Demo 3: OWL 2 RL + SKOS Core

The ontology aligns a local catalog vocabulary to SKOS. Try changing :transportData into :mobilityData, or add a second :relatedTopic value, and check how the SKOS entailments follow.

Ready.
Click “Run inference”.

I think this is the core of a pragmatic Linked Data workflow. We do not have to force everyone to use the same vocabulary from day one. We can let systems publish terms that make sense in their own context, and later introduce reusable alignments when interoperability becomes valuable. The more generic the rule profile, the more useful the alignment triples become.

SHACL validation as inferencing

SHACL is usually introduced as validation. A data graph either conforms to a shape graph or it does not. But operationally, SHACL validation can also be seen as inferencing: from data and shapes, the processor derives a validation report. The inferred knowledge is not “this person is a vehicle”, but “this focus node violates this constraint”.

This perspective becomes especially interesting when SHACL is combined with OWL 2 RL. The shape can target something that is not explicitly stated in the data, but is inferred by the ontology. The following example is based on Holger Knublauch’s grandfather example, discussed in his LinkedIn post. We added the variant where OWL 2 RL infers that ex:P1 is an ex:Grandfather, after which SHACL can validate the node through sh:targetClass.

Demo 4: OWL classification feeding SHACL validation

OWL 2 RL infers that ex:P1 is a grandfather through the property chain and class expression. SHACL then reports that this grandfather has no ex:name. Try adding ex:name "Alice" to ex:P1 in the data and run the demo again.

Ready.
Click “Run inference”.

This changes how I think about validation. SHACL is not only a gatekeeper at the end of a pipeline. It can be part of the reasoning workflow itself. First infer what the incoming data means using ontologies and taxonomies. Then infer what is wrong, missing or acceptable using shapes. When the validation report is just another RDF graph, it can be logged, audited, explained and used by another process.

From validation to automated shape-to-shape planning

There is another, less obvious, role for SHACL. Imagine your application accepts incoming data according to a SHACL shape. In practice, you might reject compatible data simply because it uses a different vocabulary. That is frustrating: the data is semantically compatible, but not syntactically shaped the way your application expects.

What if a processor could look at the incoming shape, the outgoing application profile, and the ontology alignments between the involved vocabularies? It could then generate a small reasoning plan for exactly the rules that matter. This is particularly interesting for streams, where we process a long sequence of messages that have the same shape. We can afford to spend effort once on the shape-level plan, and then apply a compact runtime to every message.

Demo 5: SHACL in/out shape planning over RDF Messages

The four panes describe the vocabulary alignments, a small RDF Messages log, the trusted incoming shape and the desired outgoing shape. Try adding another message, or remove ex:debugOnly from the incoming data: the output remains focused on the SOSA application profile.

Ready.
Click “Run inference”.

I like this example because it turns shapes into more than documentation or validation contracts. Shapes become optimization hints. They tell the engine what kind of data is expected and what kind of result is useful. In a stream-processing setting, this can make a significant difference: we do not want to rediscover the same reasoning plan for every individual message if the message structure is stable.

What this means for interoperability

The important point is not that every project should run a full reasoner everywhere. The point is that the semantics of our specification artefacts can be made explicit and executable. OWL alignments can turn local vocabulary into a target model. SKOS rules can materialize semantic relations in a taxonomy. SHACL rules can derive validation reports. SHACL shapes can even help select and optimize the reasoning plan that should be applied to a stream.

This fits the idea of eventual interoperability. You can start from your own ontology, taxonomy and application profile, and later add alignments when another application needs your data. The reusable rule profiles do the repetitive work. The project-specific work becomes defining the right artefacts: the vocabulary, the taxonomy, the shape, and the alignments between them.

Future work

I see three obvious directions to make this even more useful.

First, we should involve policy and trust reasoning. In data spaces and cross-organisational data flows, it is not enough to know that a message can be aligned to your application profile. You also need to know whether you are allowed to use it, under which conditions, and whether it is trustworthy enough for the state change you are about to make. DPV and ODRL reasoning could help make those trust relations executable as well, so that every step of the flow knows what it may do.

Second, we need better inferencing for datatypes and units. Some cases are almost embarrassingly straightforward. If a shape expects xsd:gYear and a source provides xsd:dateTime, the processor can derive the year and construct a new typed literal. In N3, that kind of conversion can be made explicit as a rule.

Runnable N3 sketch: xsd:dateTime to xsd:gYear

Edit the timestamp or the predicate names and run the rule with Eyeling.

Ready.
Click “Run with Eyeling”.

Some datatype transformations can be expressed as small N3 rules: inspect a literal, transform its value, and recompose a new typed literal.

The case becomes more interesting when we move from datatype casting to unit conversion. If a shape expects Celsius and an incoming message contains Fahrenheit, the processor needs to apply a formula: \(C = (F - 32) \times \frac{5}{9}\). This is still rule-like, but it is no longer only a matter of changing the datatype IRI.

This is where I am excited by Wikifunctions, a Wikimedia project initiated by one of my personal heroes, Denny Vrandečić. Wikifunctions describes reusable functions, including for instance a Fahrenheit to Celsius conversion function. Today, such functions are commonly implemented in languages such as JavaScript and Python. I think an N3 translation would make a lot of sense as well, because it would formalize these functions as reusable Linked Data rules that can participate in the same reasoning workflow as OWL, SKOS and SHACL.

Runnable N3 sketch: Fahrenheit to Celsius

Change the Fahrenheit value, run the rule, and inspect the derived Celsius literal and Wikifunctions provenance link.

Ready.
Click “Run with Eyeling”.

A sketch of how a Wikifunctions-style Fahrenheit-to-Celsius function could be expressed as an N3 rule.

It becomes even more complex when the conversion is not only numerical but spatial. In Flanders, we commonly need built-ins to translate WKT literals between Belgian Lambert 72, Lambert 2008 and WGS84. That kind of geospatial projection involves coordinate reference systems, precision choices and domain-specific libraries. Yet the architectural pattern remains the same: a reusable function can be described, identified and invoked from a rule, after which the derived RDF can satisfy the shape expected by an application. Unit, datatype and geospatial conversion will never be completely trivial, but many practical transformations are reusable enough to become knowledge engineering components.

Third, there is currently much ado about vocabulary hubs. I increasingly see vocabulary hubs as part of the data portal story. Each dataset in a portal should be described using SHACL shapes, and those shapes make explicit which vocabularies are being used. The vocabulary descriptions in the same portal can then expose statements such as subclassing, subproperties, equivalences or inverses. That makes the portal not only a catalogue of datasets, but also a catalogue of potential alignments. If the portal can connect dataset shapes to vocabulary descriptions, it can help generate exactly the kind of alignment rules demonstrated above.

Applying this to Flanders

Flanders is a very interesting test case for this idea. On data.vlaanderen.be, application profiles already document the expected shapes of data exchanges. The Persoon Basis application profile documents constraints for exchanging person data, and the Adresregister application profile does the same for address-register data. At the same time, the underlying OSLO vocabularies already contain semantic links to broader vocabularies. A good small example is OSLO Adres: properties such as adres:Adresvoorstelling.huisnummer, adres:gemeentenaam and adres:land are linked to the W3C LOCN vocabulary.

The example below does not copy those alignment statements into the page. Instead, it fetches the OSLO Adres vocabulary directly from its Turtle URL, loads the generic OWL 2 RL rule profile, and runs the reasoner over a small input graph that uses OSLO address terms. The output is filtered to the broader W3C LOCN address terms.

Demo 6: OSLO Adres → W3C LOCN through a URL lookup

The ontology URL is fixed and read-only: the demo fetches the published OSLO vocabulary itself, then runs OWL 2 RL reasoning over the input data below.

Open
Ready.
Click “Run OWL 2 RL inference”.

The output is the important part: from OSLO input data, the reasoner derives a broader locn:Address type and LOCN properties such as locn:locatorDesignator, locn:postName and locn:adminUnitL1. A project can therefore publish data using OSLO and still expose a W3C LOCN-shaped view without hand-writing a transformation for every dataset. The same multi-layer idea also works one step earlier: a local team can first align its own vocabulary to OSLO, and OSLO can then provide the next bridge towards Europe. This is eventual interoperability at multiple layers, backed by the vocabulary descriptions that data portals already publish.

P.S.

For this blog post I coded up the RDFJS Inference Engine, which is now launched on npm and can be reused in your own projects as well: rdfjs-inference-engine. Thanks to Raf Buyle for the inspirational chat on the train that led to the first experiments on explaining how alignments can be automated. Also thanks to Jos De Roo and Patrick Hochstenbach for the late-night discussions on our reasoning channel, effectively validating (or invalidating some of) these ideas.