Automating alignments – Pieter Colpaert

As Linked Data engineers, we often make the case that Linked Data lets us align data automatically. Yet I’ve rarely seen a demo where this is really the case. I vividly remember a conversation I had with Kasia Bourée, one of the leads behind the Transmodel Ontology, at the European Commission about 10 years ago: she asked me why on earth she would have to adopt RDF, because all she had seen was that, in the Linked Data world, people also just make spreadsheets and manually align data. She was right: OWL was still too theoretical, and we were indeed creating alignments between ontologies manually, instead of relying on the links between those ontologies. That was before SHACL and before agentic AI. In this blog post I show, step by step and with demos you can edit in the page itself, how OWL 2 RL reasoning, SKOS reasoning and SHACL validation can work together in the browser using the RDFJS Inference Engine and Eyeling to fully automate data integration.

RDF gives us a way to write statements. The question is what happens when we start writing reusable knowledge about those statements. The key idea is old and still underused: a small generic rule can make many domain-specific facts useful. This post is also an answer to a comment by Enno Meijers, who saw the promise of automatically generating actionable mappings from formally defined alignments, but was still missing a practical way to move forward.

N3 is an RDF superset that lets us write rules. A rule has an antecedent and a consequent: if the pattern on the left is known, the pattern on the right can be derived. The syntax is intentionally close to Turtle, but adds formulas between curly braces and the implication symbol =>. Let’s start with the simplest possible version: if something is a :HydrogenBus, infer that it is a :ZeroEmissionVehicle.

Demo 1: an executable N3 rule

The pane contains both a tiny hard-coded alignment rule and the RDF data it runs on. Try changing :HydrogenBus or :ZeroEmissionVehicle, and run the rule with Eyeling.

N3 rule and data

@prefix : <https://example.org/mobility#> .

# Data: the source system only says that bus17 is a HydrogenBus.
:bus17 a :HydrogenBus .

# Alignment rule: for this one local class, derive the target class.
{
  ?thing a :HydrogenBus .
} => {
  ?thing a :ZeroEmissionVehicle .
} .

Ready.

New inferred triples

Click “Run with Eyeling”.

The first demo simply hard-codes one alignment rule. It says: whenever a resource is a :HydrogenBus, derive that the same resource is a :ZeroEmissionVehicle. That is fine for one class, but it becomes unmanageable as soon as we need hundreds of class and property alignments. We would be copying the same rule pattern over and over again, only changing the vocabulary terms.

This is why we need an ontology language such as OWL, and in this first step more specifically the RDFS/OWL idea of subclassing. Instead of writing a separate N3 rule for :HydrogenBus, :CargoBike, :ElectricFerry and every other local term, we write generic N3 rules once against the ontology language. Then each project only has to annotate its own vocabulary with statements such as rdfs:subClassOf. The RDFJS Inference Engine loads those generic OWL 2 RL rules in the next demo.

In other words, we move from writing rules about our domain classes to writing rules about the ontology language itself. A tiny fragment of such a rule profile could look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

# Class alignment: instances of a subclass are also instances of its superclass.
{
  ?class rdfs:subClassOf ?superClass .
  ?thing a ?class .
}
=>
{
  ?thing a ?superClass .
} .

# Property alignment: values of a subproperty are also values of its superproperty.
{
  ?property rdfs:subPropertyOf ?superProperty .
  ?subject ?property ?object .
}
=>
{
  ?subject ?superProperty ?object .
} .

The important difference is that these rules never mention :HydrogenBus or :ZeroEmissionVehicle. They only mention vocabulary primitives such as rdfs:subClassOf and rdfs:subPropertyOf. Once these generic rules exist, a concrete vocabulary only needs RDF statements that say how its classes and properties relate to other classes and properties. For OWL 2 RL, which is a pragmatic subset for inferencing, I’ve created a full spec compliant N3 file over here.

Demo 2: OWL 2 RL starts with generic rules

The demo loads the OWL 2 RL N3 rule profile and uses the vocabulary on the left as background knowledge. Try changing :HydrogenBus into :CargoBike in both panes, or add another subclass step, and run the inference again.

Ontology / vocabulary

@prefix : <https://example.org/mobility#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

# Vocabulary knowledge: the local classes are annotated once.
# The generic OWL 2 RL rules know how to interpret rdfs:subClassOf.
:HydrogenBus rdfs:subClassOf :ZeroEmissionVehicle .
:ZeroEmissionVehicle rdfs:subClassOf :Vehicle .

Data

PREFIX : <https://example.org/mobility#>

# Source data: no target classes are stated here.
# They are derived from the vocabulary knowledge on the left.
:bus17 a :HydrogenBus .

Ready.

New inferred triples

Click “Run inference”.

Demo 2 does the same conceptual alignment as Demo 1, but it moves the project-specific knowledge into the ontology. The rule profile is no longer about hydrogen buses. It is about the meaning of rdfs:subClassOf: whenever data says that something is an instance of a class, and the ontology says that this class is a subclass of another class, the reasoner may derive the broader class as well. This is the first step towards alignments that are maintainable: the rules stay generic, while the vocabulary statements carry the project-specific meaning.

From rules to reusable specification artefacts

The abstraction above is the reason I find rule-based reasoning so interesting for knowledge engineering. We can write the semantics of a vocabulary once, and apply it again and again to concrete data. The rule profiles in rdfjs-inference-engine currently include an N3 implementation of OWL 2 RL, SKOS Core, SHACL Core and a draft SHACL 1.2 Core extension layer.

That means knowledge engineering is not limited to hand-writing transformations or queries for every case. We can keep working with the artefacts we already know:

Ontologies express class hierarchies, property hierarchies, domains, ranges, equivalences, inverses and other reusable vocabulary-level knowledge.
Taxonomies, often in SKOS, organize concepts and semantic relations.
Shapes, often in SHACL, describe what a system expects, validates or wants as output.

These artefacts do not have to live in isolation. They can be combined in one reasoning step. This is where the story becomes more powerful than a single subclass example.

Combining OWL and SKOS

Suppose I have a local catalog vocabulary. In that vocabulary, a dataset topic is called :DatasetConcept, and the relation between related topics is called :relatedTopic. I can align this local vocabulary to SKOS with a small ontology: :DatasetConcept is a subclass of skos:Concept, and :relatedTopic is equivalent to skos:related.

Now OWL 2 RL and SKOS start helping each other. OWL/RDFS reasoning lifts the data into SKOS. SKOS reasoning then adds the symmetric relation and the broader SKOS semantic relation. This is a tiny example, but it is exactly the type of pattern that matters when different catalogues, data spaces or APIs use slightly different vocabularies.

Demo 3: OWL 2 RL + SKOS Core

The ontology aligns a local catalog vocabulary to SKOS. Try changing :transportData into :mobilityData, or add a second :relatedTopic value, and check how the SKOS entailments follow.

Ontology / SKOS configuration

@prefix : <https://example.org/catalog#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

:DatasetConcept rdfs:subClassOf skos:Concept .
:relatedTopic owl:equivalentProperty skos:related .

Data

PREFIX : <https://example.org/catalog#>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

:openData a :DatasetConcept ;
  :relatedTopic :transportData .

Ready.

New inferred triples

Click “Run inference”.

I think this is the core of a pragmatic Linked Data workflow. We do not have to force everyone to use the same vocabulary from day one. We can let systems publish terms that make sense in their own context, and later introduce reusable alignments when interoperability becomes valuable. The more generic the rule profile, the more useful the alignment triples become.

SHACL validation as inferencing

SHACL is usually introduced as validation. A data graph either conforms to a shape graph or it does not. But operationally, SHACL validation can also be seen as inferencing: from data and shapes, the processor derives a validation report. The inferred knowledge is not “this person is a vehicle”, but “this focus node violates this constraint”.

This perspective becomes especially interesting when SHACL is combined with OWL 2 RL. The shape can target something that is not explicitly stated in the data, but is inferred by the ontology. The following example is based on Holger Knublauch’s grandfather example, discussed in his LinkedIn post. We added the variant where OWL 2 RL infers that ex:P1 is an ex:Grandfather, after which SHACL can validate the node through sh:targetClass.

Demo 4: OWL classification feeding SHACL validation

OWL 2 RL infers that ex:P1 is a grandfather through the property chain and class expression. SHACL then reports that this grandfather has no ex:name. Try adding ex:name "Alice" to ex:P1 in the data and run the demo again.

Ontology + SHACL shapes

@prefix ex: <https://example.org/shacl12-grandfather#> .
@prefix sh: <http://www.w3.org/ns/shacl#> .
@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

ex:Grandfather
  a owl:Class ;
  rdfs:subClassOf ex:Person ;
  owl:equivalentClass [
    a owl:Class ;
    owl:intersectionOf (
      [ a owl:Restriction ; owl:onProperty ex:gender ; owl:hasValue "male" ]
      [ a owl:Restriction ; owl:onProperty ex:hasGrandchild ; owl:someValuesFrom owl:Thing ]
    )
  ] .

ex:hasGrandchild
  a owl:ObjectProperty ;
  owl:propertyChainAxiom ( ex:child ex:child ) .

ex:GrandfatherShape
  a sh:NodeShape ;
  sh:targetClass ex:Grandfather ;
  sh:property [
    sh:path ex:name ;
    sh:minCount 1
  ] .

Data

PREFIX ex: <https://example.org/shacl12-grandfather#>

ex:P1 a ex:Person ;
  ex:child ex:P1_1, ex:P1_2 ;
  ex:gender "male" .

ex:P1_1 a ex:Person ;
  ex:child ex:P1_1_1 .

ex:P1_1_1 a ex:Person .

ex:P1_2 a ex:Person ;
  ex:child ex:P1_2_1, ex:P1_2_2 .

ex:P1_2_1 a ex:Person .
ex:P1_2_2 a ex:Person .

Ready.

Selected inferred triples

Click “Run inference”.

This changes how I think about validation. SHACL is not only a gatekeeper at the end of a pipeline. It can be part of the reasoning workflow itself. First infer what the incoming data means using ontologies and taxonomies. Then infer what is wrong, missing or acceptable using shapes. When the validation report is just another RDF graph, it can be logged, audited, explained and used by another process.

From validation to automated shape-to-shape planning

There is another, less obvious, role for SHACL. Imagine your application accepts incoming data according to a SHACL shape. In practice, you might reject compatible data simply because it uses a different vocabulary. That is frustrating: the data is semantically compatible, but not syntactically shaped the way your application expects.

What if a processor could look at the incoming shape, the outgoing application profile, and the ontology alignments between the involved vocabularies? It could then generate a small reasoning plan for exactly the rules that matter. This is particularly interesting for streams, where we process a long sequence of messages that have the same shape. We can afford to spend effort once on the shape-level plan, and then apply a compact runtime to every message.

Demo 5: SHACL in/out shape planning over RDF Messages

The four panes describe the vocabulary alignments, a small RDF Messages log, the trusted incoming shape and the desired outgoing shape. Try adding another message, or remove ex:debugOnly from the incoming data: the output remains focused on the SOSA application profile.

Ontology / alignments

@prefix ex:   <https://example.org/shape-planning#> .
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

sosa:madeBySensor
  rdfs:domain sosa:Observation ;
  rdfs:range sosa:Sensor .

sosa:resultTime rdfs:domain sosa:Observation .
sosa:hasSimpleResult rdfs:domain sosa:Observation .

sosa:hasFeatureOfInterest
  rdfs:domain sosa:Observation ;
  rdfs:range sosa:FeatureOfInterest .

ex:observedBy rdfs:subPropertyOf sosa:madeBySensor .
ex:observedAt rdfs:subPropertyOf sosa:resultTime .
ex:temperatureCelsius rdfs:subPropertyOf sosa:hasSimpleResult .
ex:observedFeature rdfs:subPropertyOf sosa:hasFeatureOfInterest .

RDF Messages log

VERSION "1.2-messages"
PREFIX ex:   <https://example.org/shape-planning#>
PREFIX xsd:  <http://www.w3.org/2001/XMLSchema#>

ex:obs1 ex:observedBy ex:sensor1 ;
        ex:observedAt "2026-06-16T10:00:01Z"^^xsd:dateTime ;
        ex:temperatureCelsius "18.1"^^xsd:decimal ;
        ex:observedFeature ex:platformA .
ex:obs1 ex:debugOnly "drop me 1" .

MESSAGE
ex:obs2 ex:observedBy ex:sensor2 ;
        ex:observedAt "2026-06-16T10:00:02Z"^^xsd:dateTime ;
        ex:temperatureCelsius "18.3"^^xsd:decimal ;
        ex:observedFeature ex:platformA .
ex:obs2 ex:debugOnly "drop me 2" .

MESSAGE
ex:obs3 ex:observedBy ex:sensor3 ;
        ex:observedAt "2026-06-16T10:00:03Z"^^xsd:dateTime ;
        ex:temperatureCelsius "18.6"^^xsd:decimal ;
        ex:observedFeature ex:platformB .
ex:obs3 ex:debugOnly "drop me 3" .

Incoming SHACL shape

@prefix ex:   <https://example.org/shape-planning#> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

ex:ObservationInputShape
  a sh:NodeShape ;
  sh:closed true ;
  sh:targetClass sosa:ObservationMessage ;
  sh:property [ sh:path ex:observedBy ; sh:minCount 1 ; sh:maxCount 1 ; sh:nodeKind sh:IRI ] ;
  sh:property [ sh:path ex:observedAt ; sh:minCount 1 ; sh:maxCount 1 ; sh:datatype xsd:dateTime ] ;
  sh:property [ sh:path ex:temperatureCelsius ; sh:minCount 1 ; sh:maxCount 1 ; sh:datatype xsd:decimal ] ;
  sh:property [ sh:path ex:observedFeature ; sh:minCount 1 ; sh:maxCount 1 ; sh:nodeKind sh:IRI ] .

Outgoing SHACL shape

@prefix ex:   <https://example.org/shape-planning#> .
@prefix rdf:  <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix sh:   <http://www.w3.org/ns/shacl#> .
@prefix sosa: <http://www.w3.org/ns/sosa/> .
@prefix xsd:  <http://www.w3.org/2001/XMLSchema#> .

ex:ObservationOutputShape
  a sh:NodeShape ;
  sh:targetSubjectsOf sosa:madeBySensor ;
  sh:property [ sh:path sosa:madeBySensor ; sh:maxCount 1 ] ;
  sh:property [ sh:path sosa:resultTime ; sh:maxCount 1 ; sh:datatype xsd:dateTime ] ;
  sh:property [ sh:path sosa:hasSimpleResult ; sh:maxCount 1 ; sh:datatype xsd:decimal ] ;
  sh:property [ sh:path sosa:hasFeatureOfInterest ; sh:maxCount 1 ; sh:nodeKind sh:IRI ] ;
  sh:property [ sh:path rdf:type ; sh:hasValue sosa:Observation ] .

ex:SensorOutputShape
  a sh:NodeShape ;
  sh:targetObjectsOf sosa:madeBySensor ;
  sh:property [ sh:path rdf:type ; sh:hasValue sosa:Sensor ] .

ex:FeatureOutputShape
  a sh:NodeShape ;
  sh:targetObjectsOf sosa:hasFeatureOfInterest ;
  sh:property [ sh:path rdf:type ; sh:hasValue sosa:FeatureOfInterest ] .

Ready.

Inferred RDF Messages

Click “Run inference”.

Inspect the generated N3 alignment rules

After running the shape-planning demo, the generated N3 runtime used for the stream is shown below. The interesting point is how short it becomes: the SHACL input and output shapes tell the engine which source predicates can occur and which target predicates are useful. The ontology then specializes the generic OWL 2 RL rules into a small set of direct rules, such as turning ex:observedBy into sosa:madeBySensor. For a long stream of similar RDF Messages, this means the expensive thinking happens once at the shape level, while every message is processed with a compact runtime.

Run Demo 5 to inspect the generated rules.

I like this example because it turns shapes into more than documentation or validation contracts. Shapes become optimization hints. They tell the engine what kind of data is expected and what kind of result is useful. In a stream-processing setting, this can make a significant difference: we do not want to rediscover the same reasoning plan for every individual message if the message structure is stable.

What this means for interoperability

The important point is not that every project should run a full reasoner everywhere. The point is that the semantics of our specification artefacts can be made explicit and executable. OWL alignments can turn local vocabulary into a target model. SKOS rules can materialize semantic relations in a taxonomy. SHACL rules can derive validation reports. SHACL shapes can even help select and optimize the reasoning plan that should be applied to a stream.

This fits the idea of eventual interoperability. You can start from your own ontology, taxonomy and application profile, and later add alignments when another application needs your data. The reusable rule profiles do the repetitive work. The project-specific work becomes defining the right artefacts: the vocabulary, the taxonomy, the shape, and the alignments between them.

We should, next to automating alignments, also involve policy and trust reasoning. In data spaces and cross-organisational data flows, it is not enough to know that a message can be aligned to your application profile. You also need to know whether you are allowed to use it, under which conditions, and whether it is trustworthy enough for the state change you are about to make. DPV and ODRL reasoning could help make those trust relations executable as well, so that every step of the flow knows what it may do. This is on one of the next steps I’m going to pursue: demonstrating that this can all be done in the same process: alignments (interoperability), assertions (trust), and checking usage control policies (legal compliance), effectively realizing the Trustflows vision.

EDIT: I used to have a section here about future plans in which we could also automatically align datatypes. This has now been implemented and is being showcased in the more recent blog post on Why Linked Data.

Applying this to Flanders

Flanders is a very interesting test case for this idea. On data.vlaanderen.be, application profiles already document the expected shapes of data exchanges. The Persoon Basis application profile documents constraints for exchanging person data, and the Adresregister application profile does the same for address-register data. At the same time, the underlying OSLO vocabularies already contain semantic links to broader vocabularies. A good small example is OSLO Adres: properties such as adres:Adresvoorstelling.huisnummer, adres:gemeentenaam and adres:land are linked to the W3C LOCN vocabulary.

The example below does not copy those alignment statements into the page. Instead, it fetches the OSLO Adres vocabulary directly from its Turtle URL, loads the generic OWL 2 RL rule profile, and runs the reasoner over a small input graph that uses OSLO address terms. The output is filtered to the broader W3C LOCN address terms.

Demo 6: OSLO Adres → W3C LOCN through a URL lookup

The ontology URL is fixed and read-only: the demo fetches the published OSLO vocabulary itself, then runs OWL 2 RL reasoning over the input data below.

The official OSLO vocabulary has been loaded as background knowledge

Open

Input data using OSLO Adres terms

PREFIX ex:   <https://example.org/flanders-demo#>
PREFIX adres: <https://data.vlaanderen.be/ns/adres#>

# The input graph uses OSLO terms.
# The page fetches https://data.vlaanderen.be/ns/adres.ttl
# to discover how those OSLO terms align to W3C LOCN.
ex:adres1
  adres:Adresvoorstelling.huisnummer "10" ;
  adres:Adresvoorstelling.busnummer "2A" ;
  adres:gemeentenaam "Gent"@nl ;
  adres:land "België"@nl .

Ready.

Inferred W3C LOCN-shaped output

Click “Run OWL 2 RL inference”.

The output is the important part: from OSLO input data, the reasoner derives a broader locn:Address type and LOCN properties such as locn:locatorDesignator, locn:postName and locn:adminUnitL1. A project can therefore publish data using OSLO and still expose a W3C LOCN-shaped view without hand-writing a transformation for every dataset. The same multi-layer idea also works one step earlier: a local team can first align its own vocabulary to OSLO, and OSLO can then provide the next bridge towards Europe. This is eventual interoperability at multiple layers, backed by the vocabulary descriptions that data portals already publish.

P.S.

For this blog post I coded up the RDFJS Inference Engine, which is now launched on npm and can be reused in your own projects as well: rdfjs-inference-engine. Thanks to Raf Buyle for the inspirational chat on the train that led to the first experiments on explaining how alignments can be automated. Also thanks to Jos De Roo and Patrick Hochstenbach for the late-night discussions on our reasoning channel, effectively validating (or invalidating some of) these ideas.