<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://pietercolpaert.be/feed.xml" rel="self" type="application/atom+xml" /><link href="https://pietercolpaert.be/" rel="alternate" type="text/html" /><updated>2026-02-05T08:06:18+00:00</updated><id>https://pietercolpaert.be/feed.xml</id><title type="html">Pieter Colpaert</title><subtitle>Professor public Web APIs and Linked Data</subtitle><author><name>Pieter Colpaert</name></author><entry><title type="html">Eventual Interoperability</title><link href="https://pietercolpaert.be/interoperability/2026/01/08/eventual-interoperability.html" rel="alternate" type="text/html" title="Eventual Interoperability" /><published>2026-01-08T00:00:00+00:00</published><updated>2026-01-08T00:00:00+00:00</updated><id>https://pietercolpaert.be/interoperability/2026/01/08/eventual-interoperability</id><content type="html" xml:base="https://pietercolpaert.be/interoperability/2026/01/08/eventual-interoperability.html"><![CDATA[<div class="teaser">
  <p>
    Do you start by sitting around the table, agreeing on common semantics, and forcing every system to speak the same language from day one?
    Or do you just share what you have in whatever format you have?
    What’s pragmatic for your goals today makes collaboration harder later on.
    Well-intentioned projects prioritizing interoperability often stall waiting for consensus, and when agreement finally comes, it comes at the cost of oversimplified data models that do not capture the full level of detail you can provide.
    In this blog post I introduce the concept of <strong>eventual interoperability</strong>, which avoids getting stuck on making trade-off decisions and having to wait for consensus.
    Instead it promises you can move fast, align in a cost-efficient manner, and assert your facts when the business case is clear.
  </p>
</div>
<p>
  I don’t want interoperability to stall any progress on reaching a project’s main goals.
  If it does stall, it needs to be worth the effort and deliver short-term business value.
  If it does not do that for your project, then working on interoperability will be seen as a nuisance and a distraction.
  It feels like advocating for goals that promote future business goals that might never become relevant.
</p>
<p>
  Yet, interoperability is most noticeable when it’s absent.
  It’s when you are frustrated that you have to enter the same data over and over again, or when your route planner doesn’t contain the schedule of the bus service you want to use.
  In your day-to-day job, it’s when you identify an added value that needs data to flow from one system into another, but then you have to sit through too many alignment meetings on too many levels before you can actually make the end result work.
  <strong>Why can’t these systems simply work together?</strong>
</p>
<p>
  This tension is palpable: end-users will eventually need interoperability, but at the start of the project, it feels like premature optimization.
  We need an approach that allows you to start quickly, so that you can reach your business goals without additional friction, even if there’s no standard yet that can evolve over time by design.
</p>
<h4>From rich to aligned semantics</h4>
<p>
  By <em>rich semantics</em>, I mean the full, nuanced way you describe your own domain so your system can make precise decisions.
  By <em>aligned semantics</em>, I mean the shared vocabulary and constraints that make your data usable across systems.
  Rich semantics, on the one hand, fit our own project goals perfectly, but are not interoperable across systems.
  Aligned semantics, on the other hand, sacrifice this richness in favor of being able to reuse the data in another system.
  When offered the choice, I believe well-intentioned data engineers often choose aligned semantics too prematurely.
  Do we have to make a choice though?
</p>
</p><p>
  <strong>Eventual interoperability</strong> means you document and design your data interactions for your own work first, and align with other systems later when there is a concrete need. 
  It avoids forcing an early trade-off between semantic richness and alignment. 
  Instead, it treats interoperability as a sequence: first capture the full meaning you can provide, then create alignments when collaboration, adoption, or reuse actually depends on them.
  However, it is not an argument for ignoring interoperability at the start:
  stable identifiers and explicit semantics (even if they are local to your domain) remain the foundations that keep future alignment feasible and inexpensive.
</p><p>
  In that regard, the principle <strong>interoperability by design</strong> (cfr. <a href="https://interoperable-europe.ec.europa.eu/collection/iopeu-monitoring/glossary/term/interoperability-design">the EIF’s definition</a>) is often misunderstood.
  While the majority of the burden of data integration has been on the consumers so far, the point is not to instead move all integration work to data producers.
  Eventual interoperability is about balancing that burden of integration across the ecosystem, rather than assuming it should sit entirely with producers or entirely with consumers.
  In any case, data engineers building consumer pipelines today are already used to dealing with heterogeneous models, with ideas like <a href="https://www.databricks.com/glossary/medallion-architecture">Medallion Architecture (~2020)</a> or <a href="https://www.confluent.io/learn/what-is-shift-left/">shift-left architectures (~2024)</a> gaining traction.
  Whatever you publish is pulled through ingestion, normalization, validation, and enrichment pipelines, until it can be used to help create insights in a product.
  The question is how these pipelines can become smarter and more cost-efficient with symbolic AI or Linked Data techniques.
</p>
<figure>
  <img src="/img/eventualinterop-burden.svg" style="margin: 0 auto; display: block;" alt="Eventual interoperability">
  <figcaption>The burden of aligning data has always been on the integrator. Linked Data is not about shifting this burden to the data producers entirely, but about distributing the burden equally in the ecosystem.</figcaption>
</figure>
<p>
  A practical implication is that interoperability becomes&mdash;or maybe always was&mdash;a <strong>one-to-many process</strong>. 
  If you preserve rich semantics at the source, you retain optionality: 
  the same dataset can be aligned to multiple target models, standards, and downstream systems (including systems that do not yet exist).
</p>
<h4>Linked Data and eventual alignments</h4>
<p>
  I often see Linked Data and Knowledge Graph engineers overcomplicating their projects for interoperability’s sake.
  Driven by good intentions, engineers start reusing terms from existing vocabularies immediately—often by cherry-picking predicates and classes across many ontologies.
  I believe this is a form of <i>aligning too early</i>: it introduces up-front coordination cost and semantic ambiguity without materially reducing later integration work.
  The underlying reason is that most systems do not integrate against individual terms in isolation, but against shapes or profiles: expected graph patterns plus constraints that define what a consumer can reliably act upon.
</p><p>
  Robert Sanderson (a reference in the world of Linked Data for Digital Humanities) has also argued against cherry-picking terms across many ontologies, for a slightly different reason.
  He says those kinds of projects tend to produce solutions that are confusing and calls them Frankenstein’s monster-like models.
</p>
<blockquote>
  <p>There is a big difference between reusing models and ontologies, and cherry-picking individual terms from ontologies. 
    The first is important, the second is at worst dangerous and at best confusing. 
    When you run into a relationship you need that doesn't exist in your foundational ontology or profile, the typical advice is “Find it in some other ontology and reuse it.”&hellip; I disagree.</p>
  <cite>» Robert Sanderson in <a href="https://www.linkedin.com/posts/robert-sanderson_ontology-design-patterns-part-7-predicate-activity-7401287626453991425-uOxh/#:~:text=There%20is%20a,it.%22%20...%20I%20disagree">a LinkedIn post</a>.</cite>
</blockquote>
<p>
  Avoiding cherry-picking terms does not mean avoiding Linked Data altogether. 
  On the contrary: Linked Data provides strong building blocks for eventual interoperability, precisely because it supports explicit alignments.
  The key is to keep your source semantics internally coherent, and externalize interoperability as mappings that can be introduced by the right actor at the right time.
  In practice, those mappings can take the form of SPARQL <code>CONSTRUCT</code> queries, or rule-based interpretations of recurring ontological patterns.
</p>
<p>
  Starting with your own vocabulary can alleviate quite some burden and uncertainty when engineering your own project with interoperability by design at heart.
  As Niklas Emegård recently described it, it can feel like a <a href="https://niklasemegard.medium.com/the-secret-no-ontology-rdf-hack-nobody-tells-you-0165fe7d9003">“secret Linked Data hack”</a>: you model what you actually need, using terms that make sense in your own context, and resist the urge to prematurely converge.
  When alignment becomes necessary, vocabularies can be linked across systems and projects, and those links can be reused wherever the same interaction patterns apply.
</p>
<figure>
<figure class="highlight"><pre><code class="language-sparql" data-lang="sparql"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
</pre></td><td class="code"><pre><span class="k">CONSTRUCT</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nv">?student</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">other</span><span class="o">:</span><span class="ss">Student</span><span class="w"> </span><span class="p">;</span><span class="w">
           </span><span class="nn">other</span><span class="o">:</span><span class="ss">name</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="p">;</span><span class="w">
           </span><span class="nn">other</span><span class="o">:</span><span class="ss">identifier</span><span class="w"> </span><span class="nv">?studentID</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="nv">?course</span><span class="w"> </span><span class="nn">other</span><span class="o">:</span><span class="ss">hasStudent</span><span class="w"> </span><span class="nv">?student</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nv">?student</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">my</span><span class="o">:</span><span class="ss">Student</span><span class="w"> </span><span class="p">;</span><span class="w">
           </span><span class="nn">my</span><span class="o">:</span><span class="ss">name</span><span class="w"> </span><span class="nv">?name</span><span class="w"> </span><span class="p">;</span><span class="w">
           </span><span class="nn">my</span><span class="o">:</span><span class="ss">hasStudentID</span><span class="w"> </span><span class="nv">?studentID</span><span class="w"> </span><span class="p">;</span><span class="w">
           </span><span class="nn">my</span><span class="o">:</span><span class="ss">enrolledInCourse</span><span class="w"> </span><span class="nv">?course</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
<figcaption>An example to show that SPARQL CONSTRUCT queries can be used to translate one data shape into another.</figcaption>
</figure>
<p>
  The <code>CONSTRUCT</code> query above assumes a specific source shape and produces a target shape.
  Creating interoperability between systems then becomes a matter of maintaining a set of such transformations—versioned, testable, and scoped to concrete interactions—rather than prematurely converging on a single shared vocabulary.
</p><p>
  If you want to hear these ideas from someone else, the great Ora Lassila (Amazon) recently did <a href="https://youtu.be/Atf4DVKGuMg?t=2391">a talk at re:Invent 2025</a> in which he exemplifies reasoning using SPARQL <code>CONSTRUCT</code> queries.
  The talk shows how this ties in with <i>Generative AI</i>, and explains how he calls this kind of Linked Data approach <i>Symbolic AI</i>, as the knowledge graph can produce entailments.
</p><p>
  I think Ora Lassila would agree if I’d say that writing these transformations by hand as SPARQL queries does not scale indefinitely.
  Once alignments are made explicit at the vocabulary level, they can be reused across multiple shapes.
  Simple statements such as equivalence or inversion between terms already provide enough structure to generalize alignments, and to derive concrete transformations from them.
  In that sense, vocabulary-level alignments act as inputs for shape-level interoperability.
</p>
<figure>
<figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
</pre></td><td class="code"><pre><span class="nn">other:</span><span class="n">Student</span><span class="w"> </span><span class="nn">owl:</span><span class="n">equivalentClass</span><span class="w"> </span><span class="nn">my:</span><span class="n">Student</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="nn">other:</span><span class="n">name</span><span class="w"> </span><span class="nn">owl:</span><span class="n">equivalentProperty</span><span class="w"> </span><span class="nn">my:</span><span class="n">name</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="nn">other:</span><span class="n">identifier</span><span class="w"> </span><span class="nn">owl:</span><span class="n">equivalentProperty</span><span class="w"> </span><span class="nn">my:</span><span class="n">hasStudentID</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="nn">other:</span><span class="n">hasStudent</span><span class="w"> </span><span class="nn">owl:</span><span class="n">inverseOf</span><span class="w"> </span><span class="nn">my:</span><span class="n">enrolledInCourse</span><span class="w"> </span><span class="p">.</span>
</pre></td></tr></tbody></table></code></pre></figure>
<figcaption>With a couple of RDF triples, we can already align terms on the vocabulary level, giving input for aligning shapes.</figcaption>
</figure>
<p>
  With just a small number of RDF statements, it becomes possible to express how terms in one vocabulary relate to terms in another.
  Those relationships can then be used to generate SPARQL <code>CONSTRUCT</code> queries automatically, or to guide their systematic creation.
  This is also where rule-based approaches, such as N3 (see <a href="https://notation3.org/">Notation3</a>), become relevant: they allow expressing interaction patterns once, and applying them repeatedly as new alignments are introduced.
</p>
<figure>
<figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
</pre></td><td class="code"><pre><span class="c1"># More vocabulary statements from the OWL vocabulary</span><span class="w">
</span><span class="nn">owl:</span><span class="n">equivalentClass</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">owl:</span><span class="n">SymmetricProperty</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="nn">owl:</span><span class="n">equivalentProperty</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">owl:</span><span class="n">SymmetricProperty</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="nn">owl:</span><span class="n">inverseOf</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">owl:</span><span class="n">SymmetricProperty</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="c1"># Rule for symmetric properties</span><span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="n">?p</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">owl:</span><span class="n">SymmetricProperty</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="n">?s</span><span class="w"> </span><span class="n">?p</span><span class="w"> </span><span class="n">?o</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="n">=</span><span class="err">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">?o</span><span class="w"> </span><span class="n">?p</span><span class="w"> </span><span class="n">?s</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="c1"># Rule for equivalent classes</span><span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="n">?c1</span><span class="w"> </span><span class="nn">owl:</span><span class="n">equivalentClass</span><span class="w"> </span><span class="n">?c2</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="n">?x</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="n">?c1</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="n">=</span><span class="err">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">?x</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="n">?c2</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="c1"># Rule for equivalent properties</span><span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="n">?p1</span><span class="w"> </span><span class="nn">owl:</span><span class="n">equivalentProperty</span><span class="w"> </span><span class="n">?p2</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="n">?s</span><span class="w"> </span><span class="n">?p1</span><span class="w"> </span><span class="n">?o</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="n">=</span><span class="err">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">?s</span><span class="w"> </span><span class="n">?p2</span><span class="w"> </span><span class="n">?o</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="c1"># Rule for inverse properties</span><span class="w">
</span><span class="p">{</span><span class="w">
  </span><span class="n">?p1</span><span class="w"> </span><span class="nn">owl:</span><span class="n">inverseOf</span><span class="w"> </span><span class="n">?p2</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="n">?s</span><span class="w"> </span><span class="n">?p1</span><span class="w"> </span><span class="n">?o</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="n">=</span><span class="err">&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="n">?o</span><span class="w"> </span><span class="n">?p2</span><span class="w"> </span><span class="n">?s</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w"> </span><span class="p">.</span>
</pre></td></tr></tbody></table></code></pre></figure>
<figcaption>You can see these vocabulary alignments and rules as an interaction pattern that creates interoperability between <code>my</code> and the <code>other</code> shape. They are a great example of how we can scale up creating alignments, by stating how terms relate to other terms on the vocabulary level. 
  For that purpose, we use the Web Ontology Language (OWL). You can play with this yourself in the <a href="https://eyereasoner.github.io/eyeling/demo.html#%40prefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E.%0A%40prefix%20other%3A%20%3Chttps%3A%2F%2Fexample.org%2Fother%23%3E.%0A%40prefix%20my%3A%20%3Chttps%3A%2F%2Fexample.org%2Fmy%23%3E.%0A%0A%23%20Demo%20data%0A%3Chttps%3A%2F%2Fpietercolpaert.be%2F%23me%3E%20a%20my%3AStudent%20%3B%0A%20%20my%3Aname%20%22Pieter%20Colpaert%22%20%3B%0A%20%20my%3AhasStudentID%20%22123456789%22%20%3B%0A%20%20my%3AenrolledInCourse%20%3Chttps%3A%2F%2Fpietercolpaert.be%2Fteaching%2Fkg%2F%23course%3E%20.%0A%0A%23%20Alignments%0Aother%3AStudent%20owl%3AequivalentClass%20my%3AStudent%20.%0Aother%3Aname%20owl%3AequivalentProperty%20my%3Aname%20.%0Aother%3Aidentifier%20owl%3AequivalentProperty%20my%3AhasStudentID%20.%0Aother%3AhasStudent%20owl%3AinverseOf%20my%3AenrolledInCourse%20.%0A%0A%23%20More%20vocabulary%20statements%20from%20the%20OWL%20vocabulary%0Aowl%3AequivalentClass%20a%20owl%3ASymmetricProperty%20.%0Aowl%3AequivalentProperty%20a%20owl%3ASymmetricProperty%20.%0Aowl%3AinverseOf%20a%20owl%3ASymmetricProperty%20.%0A%23%20Rule%20for%20symmetric%20properties%0A%7B%0A%20%20%3Fp%20a%20owl%3ASymmetricProperty%20.%0A%20%20%3Fs%20%3Fp%20%3Fo%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fo%20%3Fp%20%3Fs%20.%0A%7D%20.%20%0A%0A%23%20Rule%20for%20equivalent%20classes%0A%7B%0A%20%20%3Fc1%20owl%3AequivalentClass%20%3Fc2%20.%0A%20%20%3Fx%20a%20%3Fc1%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fx%20a%20%3Fc2%20.%0A%7D%20.%0A%0A%23%20Rule%20for%20equivalent%20properties%0A%7B%0A%20%20%3Fp1%20owl%3AequivalentProperty%20%3Fp2%20.%0A%20%20%3Fs%20%3Fp1%20%3Fo%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fs%20%3Fp2%20%3Fo%20.%0A%7D%20.%0A%0A%23%20Rule%20for%20inverse%20properties%0A%7B%0A%20%20%3Fp1%20owl%3AinverseOf%20%3Fp2%20.%0A%20%20%3Fs%20%3Fp1%20%3Fo%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fo%20%3Fp2%20%3Fs%20.%0A%7D%20.">Eyeling reasoning playground</a>.</figcaption>
</figure>
<p>
  Once this mechanism is understood, as <a href="https://eyereasoner.github.io/eyeling/demo.html#%40prefix%20owl%3A%20%3Chttp%3A%2F%2Fwww.w3.org%2F2002%2F07%2Fowl%23%3E.%0A%40prefix%20other%3A%20%3Chttps%3A%2F%2Fexample.org%2Fother%23%3E.%0A%40prefix%20my%3A%20%3Chttps%3A%2F%2Fexample.org%2Fmy%23%3E.%0A%0A%23%20Demo%20data%0A%3Chttps%3A%2F%2Fpietercolpaert.be%2F%23me%3E%20a%20my%3AStudent%20%3B%0A%20%20my%3Aname%20%22Pieter%20Colpaert%22%20%3B%0A%20%20my%3AhasStudentID%20%22123456789%22%20%3B%0A%20%20my%3AenrolledInCourse%20%3Chttps%3A%2F%2Fpietercolpaert.be%2Fteaching%2Fkg%2F%23course%3E%20.%0A%0A%23%20Alignments%0Aother%3AStudent%20owl%3AequivalentClass%20my%3AStudent%20.%0Aother%3Aname%20owl%3AequivalentProperty%20my%3Aname%20.%0Aother%3Aidentifier%20owl%3AequivalentProperty%20my%3AhasStudentID%20.%0Aother%3AhasStudent%20owl%3AinverseOf%20my%3AenrolledInCourse%20.%0A%0A%23%20More%20vocabulary%20statements%20from%20the%20OWL%20vocabulary%0Aowl%3AequivalentClass%20a%20owl%3ASymmetricProperty%20.%0Aowl%3AequivalentProperty%20a%20owl%3ASymmetricProperty%20.%0Aowl%3AinverseOf%20a%20owl%3ASymmetricProperty%20.%0A%23%20Rule%20for%20symmetric%20properties%0A%7B%0A%20%20%3Fp%20a%20owl%3ASymmetricProperty%20.%0A%20%20%3Fs%20%3Fp%20%3Fo%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fo%20%3Fp%20%3Fs%20.%0A%7D%20.%20%0A%0A%23%20Rule%20for%20equivalent%20classes%0A%7B%0A%20%20%3Fc1%20owl%3AequivalentClass%20%3Fc2%20.%0A%20%20%3Fx%20a%20%3Fc1%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fx%20a%20%3Fc2%20.%0A%7D%20.%0A%0A%23%20Rule%20for%20equivalent%20properties%0A%7B%0A%20%20%3Fp1%20owl%3AequivalentProperty%20%3Fp2%20.%0A%20%20%3Fs%20%3Fp1%20%3Fo%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fs%20%3Fp2%20%3Fo%20.%0A%7D%20.%0A%0A%23%20Rule%20for%20inverse%20properties%0A%7B%0A%20%20%3Fp1%20owl%3AinverseOf%20%3Fp2%20.%0A%20%20%3Fs%20%3Fp1%20%3Fo%20.%0A%7D%20%3D%3E%20%7B%0A%20%20%3Fo%20%3Fp2%20%3Fs%20.%0A%7D%20.">demonstrated by Eyeling</a>, it becomes clear why this approach fits so naturally with eventual interoperability.
  Alignments can be introduced incrementally, applied at different points in the data pipeline, and revised without destabilizing the source model.
  The question remains: how do we make all of that manageable as part of the assets, such as shapes and vocabulary terms, in a specifications strategy.
</p>
<h4>Specifications as building blocks</h4>
<p>
  When you start a project, you will start from the business case and describe what needs to happen.
  Something comes in, some decision logic runs, and something comes out.
  That input may come from a user who writes into your system, or from a dataset you fetch elsewhere.
  The output may be a state change of your own resources, or a response that another system will act upon.
  It is this processor—the part where you validate, transform, and decide—that you want to design with interoperability mechanisms in mind.
  If you do that well, your service is not only useful locally, but can also operate in a cross-border context without a redesign.
</p>
<figure>
  <img src="/img/organisational-in-out.svg"/>
  <figcaption>A very simplified view of an organisational system in charge of certain processes or procedures.</figcaption>
</figure>
<p>
  I therefore approach a project, an API, or even a data space connector, as a collection of <strong>interaction patterns</strong>.
  An interaction pattern is a protocol or procedure that describes a repeatable exchange: what the parties send, what is checked, and what is produced.
  Some patterns are generic and recur across domains—think <a href="https://en.wikipedia.org/wiki/Verifiable_credentials">verifying credentials</a> using the <a href="https://www.w3.org/TR/vc-data-model-2.0/">W3C Verifiable Credentials</a> pattern, or doing discovery and contract negotiation using the <a href="https://eclipse-dataspace-protocol-base.github.io/DataspaceProtocol/2025-1-err1/#catalog-protocol">Dataspace Protocol</a>.
  Other patterns are domain-specific: they capture how <em>your</em> business process works, and they encode the decisions that make your organisation’s service what it is.
</p><p>
  This is the engine behind reusability.
  Instead of building one monolithic API that bakes everything into a single bespoke interface, I can compose a system out of interaction patterns.
  Each interaction pattern relies on input and output <em>shapes</em> to define what comes in and what goes out.
  These shapes in turn reuse terms from vocabularies, and vocabularies become the level on which I can express alignments and reuse them across patterns and across projects.
</p>
<figure id="reusable-components">
  <img width="75%" style="margin: 0 auto; display: block;" src="/img/reusable-dataspace-components.svg" alt="Reusable specifications as building blocks">
  <figcaption>Reusable specifications stack: project goals → interaction patterns → shapes → vocabularies → alignments.</figcaption>
</figure>
<p>
  This layered approach lets me keep authenticity where it matters.
  I can reuse interaction patterns as a whole for generic functionality, and I can still define domain-specific patterns using my own shapes and vocabulary.
  Later, when a business case appears, I can link my vocabulary to another domain model—provided I captured enough semantics up front.
  If I only collect a single “name”, it becomes hard to map to a model that distinguishes first and last name.
  At the same time, I do not want to force that split, because not all cultures worldwide share the concept of first and last name (hat-tip to my colleague Sitt Min Oo here).
  This is exactly why I keep insisting on semantic richness at the source: it preserves optionality without forcing early consensus.
</p><p>
  None of this is a manifesto against standardisation: quite the opposite.
  When a standard interaction pattern exists and fits the business case, adopting it as a whole is often the fastest path to interoperability and to working software.
  Good examples are standards such as the <a href="https://www.w3.org/TR/vc-data-model-2.0/">W3C Verifiable Credentials</a> specification or the <a href="https://eclipse-dataspace-protocol-base.github.io/DataspaceProtocol/2025-1-err1/#catalog-protocol">Dataspace Protocol</a> that define an interaction pattern together with the shapes that make that pattern executable.
  Such interaction patterns are the more reusable patterns one can use in an API.
  Also for your domain-specific interaction patterns, standardisation remains important to make alignments scale.
  Standardised reference models are helpful assets: if everyone aligns to the same terms, alignments become more cost-efficient.
</p><p>
  Neither is it a manifesto for chaos: this abstraction does not come for free.
  It requires better governance of specifications as first-class assets: interaction patterns, shapes, vocabularies, and the alignments between them.
  The cost is additional conceptual structure next to the API surface.
  The benefit however is that these components become reusable across projects and organisations, enabling more generic data infrastructure—an important property for data intermediaries and cross-organisational platforms.
</p>
<h4>Conclusion</h4>
<p>
  I’m hopeful 2026 will be the year of a new approach to interoperability.
  There’s a new school of pragmatists rising, working on interoperable solutions with Semantic Web and Linked Data technologies at the core. 
  The biggest shift is to not align from the start, but choose technology that supports reusability of components, evolvability, and of course eventually aligning when it comes to the resources you control. 
</p><p>
  We must ensure interoperability is not being perceived by your management as a distraction, but as an extremely useful asset.
  This is certainly relevant in public services, as the Interoperable Europe Act now mandates an interoperability assessment at the start of a project by a public administration in Europe.
  The consequence of that must be that a layered approach will be adopted to make API specifications more reusable.
  And, when cross-border use cases are identified, your domain-specific interaction patterns should be able to remain stable by just creating alignments on the vocabulary level.
</p><p>
  So, in summary, don’t prematurely adopt existing terms just for interoperability’s sake.
  <strong>Start from your own terms. You can align later on</strong> when it is important.
  Spend effort early on preparing for interoperability by minting stable identifiers and documenting explicit semantics.
  Approach your API as a puzzle of interaction patterns.
  Some patterns will be generic standards you can reuse; others will be domain-specific.
  Document the latter, and adopt a layered specification approach in which interaction patterns use shapes, and shapes in turn reuse terms from vocabularies.
  Vocabulary-level alignments then become the driver for scaling <strong>eventual interoperability</strong> across projects and ecosystems.
</p>
<h3>P.S.</h3>
<h5>Trustflows</h5>
<p>
  <a href="https://solidcommunity.be/events/solidlab-closing-event/">On the 19th of January</a>, our four-year research project called SolidLab comes to an end.
  We’ll launch a continuation of the community under a new name: Trustflows.
  Trustflows starts from three main learnings:
  <ol>
    <li>We’ll design data systems as data flows from the initial write to one or more read interfaces.</li>
    <li>We’ll separate authorisation from storage, so that a separate component can grant access to a resource, even if it travels across storage systems managed by different organisations.</li>
    <li>A specification strategy that follows what I explained in this post.</li>
  </ol>
   Want to be there as well? It’s <a href="https://solidcommunity.be/events/solidlab-closing-event/">open to anyone who registers</a>!
</p>
<h5>The European Interoperability Framework (EIF)</h5>
<p>
  The Commission's first <a href="https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM:2025:860:FIN">Annual Report on Interoperability in the Union</a>
  sheds light on the state of play as the Act is moving from legislation to practice.
  Governance is now in place as the Interoperable Europe Board has been instated, 
  the European Interoperability Framework from 2017 (mainly focussing on Open Data) has been adopted by all Member States into a National Interoperability Framework,
  the <a href="https://interoperable-europe.ec.europa.eu/">Interoperable Europe Portal and Community</a> are operational,
  <a href="https://interoperable-europe.ec.europa.eu/collection/assessments">interoperability assessments</a> became mandatory, 
  and <a href="https://interoperable-europe.ec.europa.eu/collection/interoperability-regulatory-sandboxes">sandboxes</a> and the <a href="https://interoperable-europe.ec.europa.eu/collection/interoperable-europe-academy">Interoperable Europe Academy</a> are growing.
  Finally, also a catalogue of <a href="https://interoperable-europe.ec.europa.eu/interoperable_solutions">interoperable solutions</a> is being created.
</p>
<p>
  It’s worth keeping an eye on these developments, certainly with the European Interoperability Framework that is going to be revised in 2026.
</p>
<h5>SHACL Rules</h5>
<p>
  Next to N3 rules, a new kid on the block will be <a href="https://www.w3.org/TR/shacl12-rules/">SHACL 1.2 rules</a>.
  It was just published into a first public working draft.
  It starts from SPARQL CONSTRUCT queries, and then adds N3-like functionality to it by defining a RULES clause.
</p><p>
  I’ll be keeping an eye on the spec to see what the benefits could be over N3 that has a very long history already.
  For now, I’ll keep using the latter.
</p>]]></content><author><name>Pieter Colpaert</name></author><category term="interoperability" /><summary type="html"><![CDATA[A pragmatic approach to interoperability: model your data richly for your own needs, then allow the eventual creation of one‑to‑many alignments.]]></summary></entry><entry><title type="html">SEMIC2025 Trip report</title><link href="https://pietercolpaert.be/conferences/2025/11/28/semic-trip-report.html" rel="alternate" type="text/html" title="SEMIC2025 Trip report" /><published>2025-11-28T00:00:00+00:00</published><updated>2025-11-28T00:00:00+00:00</updated><id>https://pietercolpaert.be/conferences/2025/11/28/semic-trip-report</id><content type="html" xml:base="https://pietercolpaert.be/conferences/2025/11/28/semic-trip-report.html"><![CDATA[<div class="teaser">
  <p>
    SEMIC is a conference bringing together the European Semantic Interoperability Community.
    It is usually organised by the country holding the presidency of the European Council, so this year it took us to Copenhagen in Denmark.
    The conference is held for two days: the first day brings workshops on topical subjects, while the second day hosts policy discussions in the context of the Interoperable Europe Act.
  </p>
</div>
<p>
  On the first day in the morning, I followed the workshop on data spaces, with the great <a href="https://www.linkedin.com/in/marcellogrita/">Marcello Grita</a> rocking the stage as the moderator.
  The biggest takeaway for me was in the talk of <a href="https://www.linkedin.com/in/valentina-staveris-21131812/">Valentina Staveris</a>:
  she explained the <a href="https://digital-strategy.ec.europa.eu/en/policies/simpl">SIMPL project</a> to empower European data spaces.
  It consists of three main parts: 
  <ol>
    <li><a href="https://code.europa.eu/simpl/simpl-open">SIMPL-Open</a>: an open-source software stack that powers data spaces and other cloud-to-edge federation initiatives.</li>
    <li>SIMPL-Labs: an environment for data spaces to experiment with open-source software and assess their level of interoperability with SIMPL. Specifically, sectoral data spaces in their early stages will be able to experiment with the deployment, maintenance, and support of the open-source software stack before deploying it for their own needs. Furthermore, more mature data spaces will be able to use SIMPL-Labs to assess their level of interoperability with SIMPL-Open.</li>
    <li>SIMPL-Live: distinct instances of the SIMPL-Open software stack deployed for specific sectoral data spaces where the European Commission itself plays an active role in their management.</li>
  </ol>
  I found this way of approaching a data space project refreshing: it has the experimental and innovative part of a project, it has the deployments, and it has the established open-source components part.
  On the <a href="https://simpl-programme.ec.europa.eu/dashboard/annual-event-2026">29th of January 2026</a>, there will be a community event in Brussels I’ll attend.
</p>
<figure>
  <img src="/img/SEMIC2025.jpg"/>
  <figcaption>Anastasia Sofou and Sander Van Dooren of the SEMIC Linked Data Event Streams team welcoming the participants to our workshop.</figcaption>
</figure>
<p>
  In the afternoon, we held our Linked Data Event Streams workshop.
  Anastasia Sofou and Sander Van Dooren opened the session and introduced the theme of the workshop: towards more sustained interoperability with LDES.
  Emilija Stojmenova Duh then introduced interoperability from the perspective of the next-generation European Interoperability Framework that is on its way.
  </p>
  <p>
    The keynote was given by Piotr Sowiński of the company <a href="https://www.neverblink.eu/">Neverblink</a>.
    He presented his work on <a href="https://jelly-rdf.github.io/dev/">Jelly</a> and how we are already collaborating on a spec called <a href="https://w3c-cg.github.io/rsp/spec/messages">RDF Messages</a> as part of the RDF Stream Processing community group.
    This way, Jelly and LDES share a common basis to build on.
  </p><p>
    I then presented the updated LDES spec and its features as it evolved greatly across 2025.
    This is the official launch of the end result of a truly intense trajectory.
</p>
<figure class="fullwidth">
  <iframe src="https://docs.google.com/presentation/d/e/2PACX-1vRnEn1d1FUyWQECY9OYvnAOE0YzwMGW4kfygkN2nPhjDWToHkbzdQz7AdrJ7gFlsg/pubembed?start=false&loop=false&delayms=3000" frameborder="0" width="1280" height="749" allowfullscreen="true" mozallowfullscreen="true" webkitallowfullscreen="true"></iframe>
  <figcaption>The slides of the launch of the new LDES specification.</figcaption>
</figure><p>
  Pavlina Fragkou then presented how <a href="https://data.europa.eu">data.europa.eu</a> is slowly but surely on their way to adopting LDES in their workflows.
  Arne Stabenau presented how he implemented an LDES server for the cultural heritage domain to support LDES adoption in Europeana.
  Gert De Tant and Brecht Van de Vyvere showed their LDES implementations for Westtoer as well as the water data space.
  Ranko Orlic presented his work during the Local Digital Twin Toolbox project where he worked on MIM compliance and SHACL shapes for LDES. He also launched the LDES server he built in .NET. Check out <a href="https://ldes-server.net">https://ldes-server.net</a>.
  Julián Rojas then presented the work on the <a href="https://rdf-connect.github.io/">RDF-Connect framework</a> and how you can build LDES-to-SPARQL pipelines, for example.
  Anikó from the Publications Office and Mantas from Lithuania then presented possible future pilots.
</p>
<p>
  The future for LDES is bright: what started as a small community with interesting ideas is now a growing and welcoming community with more and more implementations and tooling.
</p><p>
  On the second day of SEMIC, it was great to see the strong interest in the new version of the European Interoperability Framework (EIF).
  As part of the informal expert group, I have been working with a group of 9 experts to revise the EIF that was last updated in 2017 towards a 2026 version.
  This was quite a challenge, but we delivered our first draft on time and it now has to undergo a couple of review cycles.
</p>
<figure>
  <img src="/img/SEMIC2025-2.jpg"/>
  <figcaption>A panel on the new European Interoperability Framework</figcaption>
</figure>
<p>
  Overall, SEMIC always feels like a homecoming to me and I will keep attending it: it’s an amazing community of people working towards a common goal, both from the policy perspective and from the technical perspective.
  It’s where both perspectives meet.
  </p><p>
  It was great to catch up with old and new friends, who are too many to mention individually: <a href="https://www.linkedin.com/in/maria-keet/">Maria Keet</a>, <a href="https://www.linkedin.com/in/tonzijlstra/">Ton Zijlstra</a>, Tanja Ronzhina, Matthias Palmér, ...
  There was also a great delegation from Belgium/Flanders present.
</p>]]></content><author><name>Pieter Colpaert</name></author><category term="conferences" /><summary type="html"><![CDATA[In November 2025, I went to the SEMIC conference in Copenhagen. This is the trip report, in which I talk about data spaces, LDES and the EIF.]]></summary></entry><entry><title type="html">Named graphs and RDF messages</title><link href="https://pietercolpaert.be/linkeddata/2025/09/30/named-graphs.html" rel="alternate" type="text/html" title="Named graphs and RDF messages" /><published>2025-09-30T00:00:00+00:00</published><updated>2025-09-30T00:00:00+00:00</updated><id>https://pietercolpaert.be/linkeddata/2025/09/30/named-graphs</id><content type="html" xml:base="https://pietercolpaert.be/linkeddata/2025/09/30/named-graphs.html"><![CDATA[<div class="teaser">
  <p>
    I was often confused when I overheard logicians talk meticulously&mdash;like only logicians can&mdash;about the semantics of named graphs that were <i>left open</i>.
    Not being a logician myself, I often use named graphs in my projects and I didn’t realize I could possibly have been doing anything wrong.
    I’m pretty sure I’m not the only one:
    Are these logicians just being pedantic? Or should we pay close attention to what they’re saying, or even worse, should we abandon named graphs altogether?
    In this post, I give my pragmatic view on named graphs and why I believe they are here to stay.
    I hope my view is one&mdash;how wonderful that would be&mdash;logicians can agree with.
    However, I also criticize RDF1.1, not for having open semantics for named graphs, but for not having the concept of “RDF messages”.
  </p>
</div>
<p>
  After all these years since RDF1.1, named graphs are still very controversial today in a small circle of people.
  The reason is quite simple: the fourth element added to a triple, the named graph, <a href="https://www.w3.org/TR/rdf11-datasets/">was left semantically undefined</a>.
  We have to understand that named graphs were already <a href="https://www.sciencedirect.com/science/article/abs/pii/S1570826805000235">a well-discussed idea back in 2005 (Carroll et al., 2005)</a>, but <a href="https://www.w3.org/TR/rdf11-concepts/">the RDF1.1 recommendation</a> introducing the fourth element was only finalized in 2014.
  Different technologies already had their own uses and implementations of named graphs, and the RDF1.1 group could not agree on a single specific meaning.
</p>
<div class="quote">
  <p>The RDF Working Group did not define a formal semantics for a multiple graph data model because none of the semantics presented before could obtained consensus. Choosing one or another of the propositions before would have gone against some deployed implementations. Therefore, the Working Group discussed the possibility to define several semantics, among which an implementation could choose, and provide the means to declare which semantics is adopted.</p>
  <p class="quote-src">Src: <a href="https://www.w3.org/TR/rdf11-datasets/">RDF 1.1: On Semantics of RDF Datasets</a></figcaption>
</div>
<p>
  Granted, a working group note did state that the precise interpretation could be provided explicitly in the metadata of a service you’re using.
  The only problem is, that I have not seen this being done 9 years later and I wouldn’t know about a vocabulary that would allow you to do so.
</p>
<p>
  Pragmatically, I don’t believe this is as big of a problem as some would dare to state.
  When importing quads for another source however, you’ll need to know what you’re doing.
  In this post, I’ll try to gradually explain that either way, whether the source is using named graphs or not, you’ll have to interpret the quads.
  Only after a translation step, which we’ll call a “contextual assertion” process, you’re going to be able to import them in your store.
</p>
<figure>
<figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
</pre></td><td class="code"><pre><span class="nl">&lt;https://pietercolpaert.be/&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">foaf:</span><span class="n">Person</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
  <figcaption>“I’m a person”&mdash;The intention of putting the statement in the context of a named graph can serve many different purposes. If you don’t know the system it is used in, you cannot automatically understand for what it is being used. I’m refuting this snippet later on.</figcaption>
</figure>

<p>
  We’ll first dive into 2 very different examples of how named graphs are used in practice today.
  We’ll then discuss the issues that arise from the open semantics, and I’ll give my advice on how to use named graphs in your next project.
  Finally, I’ll introduce the idea of RDF messages, which I believe is a necessary addition to RDF1.1.
</p>
<h4>An example: processing an RDF stream</h4>
<figure>
<figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nl">&lt;https://pietercolpaert.be/#2025-09-23&gt;</span><span class="w">
    </span><span class="nn">prov:</span><span class="n">generatedAtTime</span><span class="w"> </span><span class="s">"2025-09-23T12:00:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="nl">&lt;https://pietercolpaert.be/#2025-09-23&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="nn">foaf:</span><span class="n">age</span><span class="w"> </span><span class="mi">36</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
  <figcaption>A typical example of an RDF Stream Processing message: a set of statements, those mentioned in this particular interaction in the named graph, are generated at this specific time.</figcaption>
</figure>
<p>
  There is an <a href="https://www.w3.org/groups/cg/rsp/">RDF Stream Processing (RSP) community</a> that creates statements as in the example above: it uses the named graph to put certain triples in context.
  This way you can select the ones that are useful for your system.
    You should not consider these triples unless you are specifically interested in the statement made at that specific time.
  SPARQL can then support you to query for exactly those statements you are interested in cfr. the example below.
</p>
<figure>
<figure class="highlight"><pre><code class="language-sparql" data-lang="sparql"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
</pre></td><td class="code"><pre><span class="k">SELECT</span><span class="w"> </span><span class="nv">?time</span><span class="w"> </span><span class="nv">?age</span><span class="w">
</span><span class="k">WHERE</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="k">GRAPH</span><span class="w"> </span><span class="nv">?g</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nn">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="nn">foaf</span><span class="o">:</span><span class="ss">age</span><span class="w"> </span><span class="nv">?age</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="p">}</span><span class="w">
  </span><span class="nv">?g</span><span class="w"> </span><span class="nn">prov</span><span class="o">:</span><span class="ss">generatedAtTime</span><span class="w"> </span><span class="nv">?time</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="k">ORDER</span><span class="w"> </span><span class="k">BY</span><span class="w"> </span><span class="k">DESC</span><span class="p">(</span><span class="nv">?time</span><span class="p">)</span><span class="w"> </span><span class="k">LIMIT</span><span class="w"> </span><span class="mi">1</span>
</pre></td></tr></tbody></table></code></pre></figure>
<figcaption>When querying the default graph, you will/should not be able to find my age. You will only be able to find my current age if you search for a specific graph, and filter on a graph with the last timestamp.</figcaption>
</figure>

<h4>Another example: Uniprot’s partitioning</h4>
<p>
  One of the biggest RDF datasets everyone should know about is <a href="https://www.uniprot.org/">Uniprot</a>.
  UniProt is a knowledge graph providing context about protein sequences and is maintained by a consortium of international bioinformatics institutes.
  UniProt does not use named graphs for putting triples in context, but simply to organize them into partitions.
  I.e. all statements about diseases can be found in the named graph <a href="https://sparql.uniprot.org/diseases"><code>https://sparql.uniprot.org/diseases</code></a>.
  The <a href="https://sparql.uniprot.org/">documentation of their SPARQL endpoint</a> says there are 21 named graphs, although all named graphs are available in the default graph, so we should not worry about them.
</p>
<p>
  Making all triples in all named graphs also queryable from the default graph is default behavior across popular triple stores such as <a href="https://graphdb.ontotext.com/documentation/11.1/query-behavior.html#what-are-named-graphs:~:text=The%20dataset%E2%80%99s%20default%20graph%20contains%20the%20merge%20of%20the%20database%E2%80%99s%20default%20graph%20AND%20all%20the%20database%20named%20graphs%3B">GraphDB</a> or <a href="https://blog.metaphacts.com/the-default-graph-demystified#:~:text=Triplestores%20that%20use%20an%20inclusive,part%20of%20the%20default%20graph.">RDF4J</a>.
  Others, such as <a href="https://jena.apache.org/tutorials/sparql_datasets.html#accessing-the-dataset:~:text=This%20is%20the%20default%20graph%20only%20%2D%20nothing%20from%20the%20named%20graphs%20because%20they%20aren%E2%80%99t%20queried%20unless%20explicitly%20indicated%20via%20GRAPH.">Apache Jena</a> or the <a href="https://comunica.dev">Comunica query engine</a>, only query the statements explicitly that are explicitly in the defaul graph, or make it configurable.
  This is thus where you need to know what you’re doing.
  If we would want to load in the data from the RDF Stream Processing example in the default configurations of GraphDB or RDF4J, then our default graph would contain multiple contradicting statements.
  For example, it would contain that I’m both 35 years old and 36 years old, without the possibility anymore to check the context of the statement.
</p>
<p>
  From the vendor perspective, I understand that they need a way to organize their work.
  In the case of Uniprot, although I didn’t check whether this is actually the case, I can imagine that when a new release of the database is done, that they drop a named graph entirely, and load the new dump for that part in place.
    The company RedPencil even has a system in which they <a href="https://ceur-ws.org/Vol-3565/QuWeDa2023-paper2.pdf">rewrite SPARQL queries based on role-based access control rules</a>.
  The latter however is not just for logical partitioning anymore: a named graph becomes available in the default graph when the access conditions are met.
</p>
<h4>From open semantics to functional use</h4>
<!-- Part 1: is is context, quoting or logical partitions -->
<h5>Issue 1: Logical partitions vs. contextual assertions</h5>
<p>
  Since RDF 1.1 left the semantics of named graphs open, practitioners have no choice but to adopt a functional perspective: how do we actually use them?
    The <strong>first issue</strong> is recognizing the different interpretations. Named graphs can be seen as <em>contextual assertions</em> (the triples hold in a certain context), as <em>quotations</em> (someone said these triples), or as <em>logical partitions</em> (all triples are true globally, just organized into buckets).
</p><p>
  Pragmatically, I believe we only need one semantics: <strong>contextual assertions</strong> in which a named graph means “the statements inside are true in this context”.
  Quoting is a special case, where the context (“Pieter said so”) is more important than the payload.
  Partitioning is then yet another special case, where the system implicitly asserts a specific context, or maybe all contexts, as globally true.
</p><p>
  Take the RSP example above: the fact that I’m 36 at a particular moment is only useful once the client has validated the context.
  Consumers decide which context matters, depending on their task. A time travel app might show my profile from 10 years ago, while another application may only want the latest context.
  SPARQL makes this straightforward: you can simply add conditions on <code>?g</code> to for example only keep the most recent graph.
</p><p>Quoting follows the same design pattern. The triples go in a named graph, but now the context metadata drives the query. For example, you may want to count how many things Pieter has been saying, without the need for asserting the statements in the named graphs.
</p><p>
  Partitioning is yet another special case. Systems such as UniProt use named graphs for organizational purposes: all triples appear in the default graph, and for the consumer’s sake, the named graph are hardly relevant.
  In RedPencil system, access control rules determine which named graphs are exposed through the default graph. From the consumer’s perspective, this still looks like contextual assertions—just that the server has already decided which contexts to include.
</p><p>
  Problems arise, not because of the open semantics, but because some systems <em>only</em> support partitioning: they automatically merge all named graphs into the default graph.
  In such stores, importing contextual data collapses into global assertions, making it impossible to recover the original contexts.
  For example, you’ll find that Pieter is both 35 and 36 years old.
  If your store is specific to one context (e.g., all things in this store are true at this moment), this is fine.
  But if you want to preserve contextual assertions, you either need an import-time processing step that flattens the data consistently, or move to a different store.
  Some RDF stores let you configure which graphs contribute to the default graph, which can already be a first step toward treating partitions as explicit contexts.
</p>
<p>
  <strong>&gt; My advice: </strong>for your next project, treat named graphs as contextual assertions by default, and apply their use as partitions when this is explicitly desired behaviour.
</p>
<h5>Issue 2: The named graph identifier</h5>
<!-- Part 2: does the named graph point at the set of triples, or is it just an identifier for the context? It’s certainly the latter, but the first is again a special case...  -->
<p>
  The <strong>second issue</strong> is one related to how you name your graphs, as the fact that are no strict rules can lead to confusion.
  The way I used the named graph in the first example in this post&mdash;using my website’s identifier as a named graph IRI&mdash;is problematic.
    Someone might use that identifier for something else, such as saying that my webpage contains 42 triples (it <a href="https://query.linkeddatafragments.org/#datasources=https%3A%2F%2Fpietercolpaert.be&query=SELECT%20%28count%28*%29%20as%20%3Fcount%29%20WHERE%20%7B%0A%20%20%20%20%3Fs%20%3Fp%20%3Fo%20%0A%7D">actually does</a> at the time of writing this post).
  However, if in my context, I want it to say that the named graph on my system is a container for 1 triple, then there will be a <em>semantic collision</em> (I don’t think anyone coined this term in Linked Data yet&mdash;it adds the right amount of drama).
</p>
<figure> <figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nl">&lt;https://pietercolpaert.be/&gt;</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="nn">foaf:</span><span class="n">age</span><span class="w"> </span><span class="mi">36</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="nl">&lt;https://pietercolpaert.be/&gt;</span><span class="w"> </span><span class="nn">void:</span><span class="n">triples</span><span class="w"> </span><span class="mi">42</span><span class="w"> </span><span class="p">.</span><span class="w"> </span><span class="c1"># 💣</span><span class="w">
</span><span class="nl">&lt;https://pietercolpaert.be/&gt;</span><span class="w"> </span><span class="nn">void:</span><span class="n">triples</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">.</span><span class="w">  </span><span class="c1"># 💥</span>
</pre></td></tr></tbody></table></code></pre></figure>
   <figcaption>Using a named graph IRI that is also an information resource is a recipe for generating semantic collisions.</figcaption>
</figure>
<p>
  This simple example of using a named graph IRI that is also an information resource is only the tip of the iceberg.
  We want to be able to talk about our specific context, so it’s a good idea to use an IRI that you control.
  Not only because you want to avoid this kind of collisions, but also because the source may still change contents or change the definition of the context later on.
  From the moment a named graph travels across systems, the context changed: it has now become data that traveled from one system to another, so I believe it should get a new identifier.
  That identifier can however be used to point to the original context.
</p>
<p>
  Renaming graphs when they travel across systems is however not default behaviour in RDF.
  At least, it is not for named graphs with an IRI.
  If you use <strong>blank nodes as graph names</strong>, and these blank nodes travel across systems, they will get a new identifier automatically.
</p><p>
  <strong>Blank Node Graphs</strong> immediately also solve other problems, such as the one of accidentally&mdash;or intentionally&mdash;overwriting someone else’s named graph.
  Furthermore, you cannot dereference a blank node, nor name it from another document, so you’re certain that the quads in this interaction are complete, and you know that you have the full context about this graph as well.
  Certainly when working with contextual assertions, the name itself becomes less important anyway: you’re selecting a graph based on a description of the context.
</p>
<figure> <figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
</pre></td><td class="code"><pre><span class="nn">_:</span><span class="n">b0</span><span class="w"> </span><span class="p">{</span><span class="w">
    </span><span class="nl">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="nn">foaf:</span><span class="n">age</span><span class="w"> </span><span class="mi">36</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="nn">_:</span><span class="n">b0</span><span class="w"> </span><span class="nn">void:</span><span class="n">triples</span><span class="w"> </span><span class="mi">1</span><span class="w"> </span><span class="p">;</span><span class="w">
     </span><span class="nn">prov:</span><span class="n">generatedAtTime</span><span class="w"> </span><span class="s">"2025-09-29T16:34:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">.</span>
</pre></td></tr></tbody></table></code></pre></figure>
   <figcaption>Using a Blank Node Graph avoids confusion: a consumer must rename, and therefore nobody can accidentally add statements in or about this graph from another source.</figcaption>
</figure>
<p>
  What do you do with inferred knowledge then, one might wonder?
    Inferring from this data that my approximate birth year is 1989 will spawn a new context: a context that derived insights from this one.
</p>
<p>
  <strong>&gt; My advice:</strong> You must assume all triples from a certain external source, even when they are in the default graph, as done in a specific context of the source. This means that we always need to have an “assertion process” before loading data from somewhere. Some sources may explicitly have multiple contexts, for which they can use named graphs, and for which you can use the data about these named graphs in your assertion process (i.e. you may be only interested in statements made after a certain generation time). You should never blindly import named graph IRIs into your own system. As a data publisher, you can help consumers to automatically rename their graphs by using blank nodes as graph names.
</p>
<p>
  Some stores however are not optimized for following my advice as they don’t index on graph name.
    This is certainly necessary when we want to be able to query across many small graphs, however.
  Either way, the idea remains: when using named graphs, you need to know what you’re doing.
</p>
<h4>Publishing RDF messages and asserting data from them</h4>
<p>
  In our work at <a href="https://knows.idlab.ugent.be">the Knowledge on Web-Scale team</a>, we keep running into the same challenges: 
  how to package and transmit RDF in a way that not only carries the data itself, but also its context&mdash;provenance, trust, credential boundaries, or member extraction hints.
  Whether we were dealing with verifiable credentials in wallets, with trust-preserving event streams in writable Linked Data nodes, or with extracting the set of quads from what we call the members of a <a href="https://ldes.tech">Linked Data Event Stream</a>.
</p>
<p>
  Consider the example below: I want to indicate that there’s an event stream for public access with a message that contains context data, such as the fact that it is signed and generated at a specific time, as well as the statements containing my age.
  Using the context, a consumer can select those statements that are of interest to them, with the right trust level in which in this example the statement has been signed by an authority.
</p>
<figure> <figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
</pre></td><td class="code"><pre><span class="nl">&lt;stream1&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">ldes:</span><span class="n">EventStream</span><span class="p">;</span><span class="w">
  </span><span class="nn">tree:</span><span class="n">member</span><span class="w"> </span><span class="nl">&lt;message/2025-09-23/1&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">tree:</span><span class="n">view</span><span class="w"> </span><span class="nl">&lt;&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">  
  </span><span class="nn">acl:</span><span class="n">accessRights</span><span class="w"> </span><span class="s">"public"</span><span class="w"> </span><span class="p">.</span><span class="w">

</span><span class="nl">&lt;message/2025-09-23/1&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Message</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">received</span><span class="w"> </span><span class="s">"2025-09-25T16:30:00Z"</span><span class="p">^^</span><span class="nn">xsd:</span><span class="n">dateTime</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">content</span><span class="w"> </span><span class="nn">_:</span><span class="n">b0</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">signature</span><span class="w"> </span><span class="nn">_:</span><span class="n">b1</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="c1"># The exact content used for that context in a blank node graph</span><span class="w">
</span><span class="nn">_:</span><span class="n">b0</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nl">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="nn">foaf:</span><span class="n">age</span><span class="w"> </span><span class="mi">36</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span><span class="w">
</span><span class="c1"># The signature: a named graph is needed</span><span class="w">
</span><span class="c1"># otherwise we cannot keep the triples of the signature with the message</span><span class="w">
</span><span class="nn">_:</span><span class="n">b1</span><span class="w"> </span><span class="p">{</span><span class="w">
  </span><span class="nl">&lt;Sig1&gt;</span><span class="w"> </span><span class="nn">ex:</span><span class="n">signatureValue</span><span class="w"> </span><span class="s">"MEUCIQDh..."</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">signedBy</span><span class="w"> </span><span class="nl">&lt;https://pietercolpaert.be/#me&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">signsGraph</span><span class="w"> </span><span class="nn">_:</span><span class="n">b0</span><span class="w"> </span><span class="p">;</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">signatureAlgorithm</span><span class="w"> </span><span class="s">"RS256"</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="p">}</span>
</pre></td></tr></tbody></table></code></pre></figure>
  <figcaption>Blank Node Graphs are effective for associating a context with a set of triples. In this example however, we become forced to also use named graphs for something else: packaging triples together as part of one message. It works, but it’s clumsy. If only there were a better syntax-level tool to state the boundaries of an RDF message.</figcaption>
</figure>
<p>
  There’s a problem for our use case though: how do you select all the quads that are part of this event stream’s message?
  The quads are undeniably grouped together: adding them to this server was an atomic operation.
  In a streaming protocol like websockets, this would have been a message that was sent and received by a listener.
  In the <a href="https://jelly-rdf.github.io/dev/">binary RDF serialization Jelly</a> we could have used their concept of <em>frames</em>.
  However, in the RDF1.1 specification, there’s nothing we can rely on.
</p>
<p>
  Instead, we need to use a heuristic that will never be perfect.
  In a W3C Member Submission in 2005 predating named graphs, <a href="https://www.w3.org/submissions/CBD/">Concise Bounded Description (CBD)</a> was coined as a unit of specific knowledge about a resource that could be interchanged between semantic web agents.
  I believe the author, Patrick Stickler, bumped into the same limitation as we’re bumping in today. The only difference is that we do have named graphs and we can solve our problem slightly differently.
</p>
<figure> <figure class="highlight"><pre><code class="language-turtle" data-lang="turtle"><table class="rouge-table"><tbody><tr><td class="gutter gl"><pre class="lineno">1
2
3
4
5
6
7
8
9
10
11
12
</pre></td><td class="code"><pre><span class="c1">## The Concise Bounded Description of entity 1 is</span><span class="w">
</span><span class="c1">## the subject-based star pattern including the</span><span class="w">
</span><span class="c1">## subject-based star patterns of the blank nodes it refers to</span><span class="w">
</span><span class="nl">&lt;entity1&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Type</span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">label</span><span class="w"> </span><span class="s">"an entity"</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">otherEntity</span><span class="w"> </span><span class="nl">&lt;entity2&gt;</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">otherEntity</span><span class="w"> </span><span class="p">[</span><span class="w">
    </span><span class="nn">ex:</span><span class="n">label</span><span class="w"> </span><span class="s">"Yet another entity"</span><span class="w"> </span><span class="p">.</span><span class="w">
  </span><span class="p">]</span><span class="w"> </span><span class="p">.</span><span class="w">
</span><span class="c1">## Not part of the CBD of entity1</span><span class="w">
</span><span class="nl">&lt;entity2&gt;</span><span class="w"> </span><span class="k">a</span><span class="w"> </span><span class="nn">ex:</span><span class="n">Type</span><span class="w"> </span><span class="p">;</span><span class="w">
  </span><span class="nn">ex:</span><span class="n">label</span><span class="w"> </span><span class="s">"another entity"</span><span class="w"> </span><span class="p">.</span>
</pre></td></tr></tbody></table></code></pre></figure>
  <figcaption>The Concise Bounded Description of a resource, as defined in 2005 before named graphs, includes the statements which does not include any explicit knowledge about any other resource which can be obtained separately from the same source.</figcaption>
</figure>
<p>
  The CBD of a resource <code>s</code> is defined as the set of triples consisting of all triples with subject <code>s</code>, plus recursively the CBDs of all blank nodes that are the object of a triple in the CBD.
    It was a pragmatic solution, and a heuristic at best. The fact that it was limited to subject-based star patterns put a limit on its usefulness.
</p><p>
  Today, <a href="https://w3id.org/ldes/specification/#members">in Linked Data Event Streams</a>, we actually use the CBD of the member IRI as the package for the triples in a message.
    We have however extended it to now include all triples that are in a blank node graph that is mentioned in the CBD.
  This way, we can include the signature triples in the message as well.
  While this works, maybe this should not have been necessary.
  The signature triples would have made perfect sense in the default graph otherwise.
</p>
<h5>RDF1.1 critique: serializations like TRiG should have the concept of “RDF Messages”</h5>
<p>
  When using named graphs in a big file with quads, and you want to fetch data from one specific named graph, you’ll need to process the full file.
  RDF does not attach semantics to where in a file you mention a quad.
  For JSON-LD, there’s a <a href="https://www.w3.org/TR/json-ld11-streaming/#triple-ordering-recommended">working group note</a> that specifies recommended ordering of triples in a document for streaming parsing.
  However, you still will need to wait for the full file to be parsed, to be certain that you got a full package of quads.
  This comes by design with a performance decrease.
</p>
<p>
  The fix can be quite straightforward by adding a small feature in the existing syntaxes:
    a pragma for parsers to understand that an RDF message started, and when it ended again.
  It is for example already added in the recently proposed binary serializations <a href="https://jelly-rdf.github.io/dev/specification/serialization/#stream-frames">Jelly RDF</a>, in which it is called “frames”.
  You could compare this idea to the idea of <a href="https://en.wikipedia.org/wiki/JSON_streaming">streaming JSON documents</a> such as Newline Delimited JSON, but then also for RDF-based serializations.
  Maybe this idea of RDF Messages could be something that gets standardized by the <a href="https://www.w3.org/community/rsp/">RDF Stream Processing Community Group</a>.
  Something like newline delimited JSON-LD has been discussed in the <a href="https://github.com/w3c/sparql-dev/issues/140">SPARQL specification issue tracker</a>.
</p>
<h5>SPARQL1.1 critique: you cannot construct data into multiple named graphs</h5>
<p>
  The idea of contextual assertions means that we need a query language to select the data we’re interested in, and then construct data into our own context.
    For such context, I advise to use blank node graphs.
  SPARQL however today does not allow you to CONSTRUCT data into a named graph.
  This is an open issue in the SPARQL issue tracker: <a href="https://github.com/w3c/sparql-dev/issues/31">#31</a>.
  This of course does not break our ideas, but it would be useful to attach a derived context to the data you’re constructing.
</p>
<h3>P.S.</h3>
<p>
  Piotr Sowiński of Neverblink, who is the lead behind the Jelly RDF initiative, is going to give the keynote in the <a href="https://interoperable-europe.ec.europa.eu/collection/semic-conference">Linked Data Event Streams workshop at the SEMIC conference</a>.
</p><p>
  In TREE/LDES we first tried to work around open semantics of named graph by having a <a href="https://treecg.github.io/specification/#member-extraction-algorithm">very complex member extraction algorithm</a>.
    I regret having spent so much time trying to get it right, while in the end it’s just a heuristic.
  We even tried to position a concept called Shape Topologies, to try to reconstruct the RDF Message based on the SHACL shape of the member.
  It would have been much simpler if we had had the concept of RDF messages.
</p>
<p>
  Having <a href="https://josd.github.io/">Jos De Roo</a>&mdash;without whom my understanding about blank nodes and named graphs would never be where it is today&mdash;around in the office has some perks: you occasionally get a reminder of historic discussions during lunch.
    <a href="https://en.wikipedia.org/wiki/Pat_Hayes">Pat Hayes</a> would apparently call confusing use and mention the “mother of all bugs”.
  The single most fundamental mistake in Web and semantic web architecture is failing to distinguish between talking about the thing and talking about the name of the thing.
</p><p>
  I’ve been told that I follow my own interpretation of the 90ies work on <a href="http://jmc.stanford.edu/articles/mccarthy-buvac-98/context.pdf">“formalizing context” by the logicians McCarthy and Buvač</a>.
    We can write this context proposition more formally as follows: <code>istrue(context,φ)</code>.
  The set of statements in φ is true in a context; you can thus assert φ when your application’s state asserts this context as true indeed, for example because the source is truthful:
  <code>istrue(SourceContext,φ)∧trustedByClient(SourceContext)
       ⇒istrue(ClientTruth,φ)</code>.
</p>
<p>
  It was Tobias Rebert who initially triggered me through a <a href="https://www.linkedin.com/feed/update/urn:li:activity:7372189350584807424/">LinkedIn post</a> to write this blog post: I was a bit frustrated that he would see labeling named graphs as something you’d do while drinking coffee, just like cleaning your <em>home</em> or <em>My Documents folder</em>.
  I couldn’t just comment among one of the many that started commenting, because I needed more nuance than that.
  Tobias, if you’d read this, do let me know whether you agree with me!
</p>
<p>
  <em>EDIT 2025-10-01:</em> I was asked for what my view was on RDF1.2 then, and whether the problem discussed here wouldn’t be solved then.
  I argue that there is no problem where people see one (the open semantics).
  Where I do see a small problem (streaming RDF messages) RDF1.2 doesn’t provide a solution and it’s too late to propose this to the group at this stage.
  The revived RSP group might be a better place to propose this, as it’s also a typical problem in streaming scenarios.
  Triple Terms in RDF1.2 are a solution for something different yet complementary, which will help datasets like Wikidata come up with a more standardized solution to the property-based triple annotation system they have today: not for contextual assertions, but for tracking the provenance of a resulting asserted statement.
  It’s very complementary to the idea though.
</p>]]></content><author><name>Pieter Colpaert</name></author><category term="linkeddata" /><summary type="html"><![CDATA[I’ve seen named graphs being called the most horrendous mistake in RDF standardization, to a necessary feature for which, without it, RDF wouldn’t be where it is today. In this blog post I try to explain my interpretation of all this, and how I believe named graphs are here to stay. I also introduce a future outlook, in which I believe we need to talk about RDF messages.]]></summary></entry><entry><title type="html">SEMANTiCS2025 Trip report</title><link href="https://pietercolpaert.be/conferences/2025/09/08/semantics-trip-report.html" rel="alternate" type="text/html" title="SEMANTiCS2025 Trip report" /><published>2025-09-08T00:00:00+00:00</published><updated>2025-09-08T00:00:00+00:00</updated><id>https://pietercolpaert.be/conferences/2025/09/08/semantics-trip-report</id><content type="html" xml:base="https://pietercolpaert.be/conferences/2025/09/08/semantics-trip-report.html"><![CDATA[<div class="teaser">
  <p>
    SEMANTiCS is an international conference that brings together companies and research working on technologies that aim to be interoperable through semantics.
    In 2025, we visited the conference with a bunch of colleagues, as well as with the companies from the DiSHACLed project.
  </p>
</div>
<p>
  As I wasn’t able to get my family logistics in the first week of school sorted out in such a way I would be able to attend the whole of SEMANTiCS2025, I attended the first day online.
  It’s great about hybrid conferences nowadays that it gives you these options. This way I also had both experiences of the hybrid conference.
</p>
<h4>The keynote of Hannah Bast</h4>
<p>
  The keynote of Hannah Bast was as thought-provoking as impressive.
  At the start of my PhD on linked transport data her publications <a href="https://ad-publications.cs.uni-freiburg.de/ESA_transferpatterns_BCEGHRV_2010.pdf">on route planning</a> were instrumental for my own research.
  Today she’s shaking up the world of triple stores with her work on the QLever SPARQL engine that is able to beat the state of the art on many fronts for querying truly big knowledge graphs such as OpenStreetMap, Uniprot, Pubchem or Wikidata.
  During the keynote she was very down to earth about benchmarks: although QLever scored pretty well according to their own tests on these benchmarks, they are still working on yet another SPARQL benchmark to have a more honest comparison. <a href="https://ad-publications.cs.uni-freiburg.de/ISWC_sparqloscope_BKTU_2025.pdf">The work on SPARQLoscope</a> is to be presented at ISWC2025 later this year.
</p>
<p>
  A line during the keynote that certain stuck was: “professors don’t code”.
  This was given as a reason why academic software is often of low quality: students have to learn the coding part by themselves.
  Hannah Bast herself is certainly an exception to this statement.
  I personally did not have the impression that “professors don’t code” within our field, certainly as I had her as an example during my PhD.
  Also my own supervisor at the time, Ruben Verborgh, I often found deep-down in coding projects.
</p><p>
  Nevertheless, I do not think my own strenghts can be found in my coding qualities.
  I do write code, but that’s more incidental&mdash;like peeling potatoes when you want to make fries.
  While I have tried to position decent software in the past, my focus today is on <a href="https://pietercolpaert.be/interoperability/2025/09/03/four-types-specification-artefacts">building interoperable dataspace ecosystems through specifications (see previous blog post)</a>.
  Luckily in our <a href="https://knows.idlab.ugent.be">Knowledge on Web-Scale team we have multiple professors</a>, each with a different set of qualities.
  I would have liked to talk this over with Hannah Bast over a coffee break, but wasn’t able to talk to her before she had to leave&mdash;that will have to happen on a next occasion.
</p>

<h4>Our team in an organizing role</h4>
<p>
  Our team took on quite some responsibilities for organizing workshops and tutorials:
  <ul>
    <li>It’s certain that the world of Linked Data needs better developer tooling. Ruben Taelman, Jerven Bolleman and Jindřich Mynarz co-organized the Developer Workshop.
    <li>A note to self: if you let your PhD students organize a tutorial at a conference, the tooling they are organizing this tutorial for is all of a sudden going to be pollished. <a href="https://smessaert.be/">Ieben Smessaert</a> and <a href="https://www.linkedin.com/in/arthur-vercruysse-baa625211/?originalSubdomain=be">Arthur Vercruysse</a> organized the RDF-Connect tutorial for building data pipelines across environments. You can still <a href="https://rdf-connect.github.io/Tutorial-SEMANTiCS2025/slides/?full#agenda">do the tutorial yourself at home and ask them for feedback</a>. The tutorial is going to extended as a full-day tutorial at the ISWC2025 conference.
    <li>The <a href="https://semantic-transportation.github.io/sem4tra-kg-website/">Semantics for Transport workshop</a> was organized by a new organization committee this year. I hope it’s the start of renewed interest in the topic!
    <li>The <a href="https://w3id.org/nxdg/2025">NXDG workshop on next generation data governance</a>, about technologies like the Data Privacy Vocabulary (DPV) and the Open Digital Rights Language (ODRL) organized by Harshvardhan Pandit and Beatriz Esteves.
  </ul>
</p>
<blockquote class="mastodon-embed" data-embed-url="https://mastodon.social/@pietercolpaert/115140093569865011/embed" style="background: #FCF8FF; border-radius: 8px; border: 1px solid #C9C4DA; margin: 0; max-width: 540px; min-width: 270px; overflow: hidden; padding: 0;"> <a href="https://mastodon.social/@pietercolpaert/115140093569865011" target="_blank" style="align-items: center; color: #1C1A25; display: flex; flex-direction: column; font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Oxygen, Ubuntu, Cantarell, 'Fira Sans', 'Droid Sans', 'Helvetica Neue', Roboto, sans-serif; font-size: 14px; justify-content: center; letter-spacing: 0.25px; line-height: 20px; padding: 24px; text-decoration: none;"> <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32" viewBox="0 0 79 75"><path d="M63 45.3v-20c0-4.1-1-7.3-3.2-9.7-2.1-2.4-5-3.7-8.5-3.7-4.1 0-7.2 1.6-9.3 4.7l-2 3.3-2-3.3c-2-3.1-5.1-4.7-9.2-4.7-3.5 0-6.4 1.3-8.6 3.7-2.1 2.4-3.1 5.6-3.1 9.7v20h8V25.9c0-4.1 1.7-6.2 5.2-6.2 3.8 0 5.8 2.5 5.8 7.4V37.7H44V27.1c0-4.9 1.9-7.4 5.8-7.4 3.5 0 5.2 2.1 5.2 6.2V45.3h8ZM74.7 16.6c.6 6 .1 15.7.1 17.3 0 .5-.1 4.8-.1 5.3-.7 11.5-8 16-15.6 17.5-.1 0-.2 0-.3 0-4.9 1-10 1.2-14.9 1.4-1.2 0-2.4 0-3.6 0-4.8 0-9.7-.6-14.4-1.7-.1 0-.1 0-.1 0s-.1 0-.1 0 0 .1 0 .1 0 0 0 0c.1 1.6.4 3.1 1 4.5.6 1.7 2.9 5.7 11.4 5.7 5 0 9.9-.6 14.8-1.7 0 0 0 0 0 0 .1 0 .1 0 .1 0 0 .1 0 .1 0 .1.1 0 .1 0 .1.1v5.6s0 .1-.1.1c0 0 0 0 0 .1-1.6 1.1-3.7 1.7-5.6 2.3-.8.3-1.6.5-2.4.7-7.5 1.7-15.4 1.3-22.7-1.2-6.8-2.4-13.8-8.2-15.5-15.2-.9-3.8-1.6-7.6-1.9-11.5-.6-5.8-.6-11.7-.8-17.5C3.9 24.5 4 20 4.9 16 6.7 7.9 14.1 2.2 22.3 1c1.4-.2 4.1-1 16.5-1h.1C51.4 0 56.7.8 58.1 1c8.4 1.2 15.5 7.5 16.6 15.6Z" fill="currentColor"/></svg> <div style="color: #787588; margin-top: 16px;">Post by @pietercolpaert@mastodon.social</div> <div style="font-weight: 500;">View on Mastodon</div> </a> </blockquote> <script data-allowed-prefixes="https://mastodon.social/" async src="https://mastodon.social/embed.js"></script>
<h4>A fishbowl session on dataspaces and whether semantics still play a role</h4>
<p>
  I brought an opening statement <a href="https://www.linkedin.com/feed/update/urn:li:activity:7369394926196846597?utm_source=share&utm_medium=member_desktop&rcm=ACoAAAd1sJMBDYWfFjzsYQJLrYrBahKjM2q9oxo">in the fishbowl</a> in which I referred to the <a href="https://eclipse-dataspace-protocol-base.github.io/DataspaceProtocol/">Eclipse dataspace protocol</a>.
  This specification uses JSON-LD, ODRL and DCAT.
  How can anyone claim that semantics would not be of interest to dataspaces? They use our work!
</p>
<p>
  What remains however is that, on the data plane (the ODRL and DCAT is in the “control plane”, to set up the data connection), we still have to fight the same fight.
  If we want interoperable instance data, we will still need to bring semantics to the individual domains.
  That’s not a characteristic of dataspaces specifically, but of working with data in general.
</p>
<p>
  In fact, the dataspaces protocol adds an important missing piece of the puzzle that we often forget within the Linked Data world.
  Next to vocabularies and application profiles, <a href="https://pietercolpaert.be/interoperability/2025/09/03/four-types-specification-artefacts">we should also work on interaction patterns, such as the contract negotiation steps in the dataspaces protocol</a>.
</p>
<h4>The DiSHACLed lunch</h4>
<p>
  We have a Flemish funded project going on called DiSHACLed. For that project we’re working on (i) extending DCAT with SHACL shapes for discovy algorithms in data portals, (ii) on SHACL based data pipelines with RDF-Connect, (iii) on extending SHACL with UI features (Ieben from our team is following the W3C data shapes working group for that reason), and (iv) looking at business models for dataspace actors.
  We’re doing this with <a href="https://elody.eu/">Inuits’ Elody team</a>, <a href="https://redpencil.io/">RedPencil</a> and <a href="https://sirus.be/">Sirus</a>: three high-potential companies in the world of semantics and interoperable data services.
</p>
<figure>
  <img src="/img/dishacled-at-semantics.jpg"></img>
<figcaption>The dishacled team at semantics</figcaption>
</figure>
<p>
  RedPencil is well-known within the SEMANTiCS community already.
  For Elody, it was a first time encounter.
  We had a lunch together to discuss the conference and discuss how to approach certain tasks within the project differently now.
</p>
<h4>Jelly and nanopublications</h4>
<p>Probably the biggest aha-erlebnis I got during the conference was with <a href="https://jelly-rdf.github.io/">Jelly</a> of my namesake Piotr Sowiński.
  He had a presentation in which he showed the great speed ups (proceedings yet to be published...) of implementing this within nanopubs.
  During a coffee break I expressed my skepticism that it is really the binary format that did the trick.
  However I was shown to be wrong: with the help of an LLM I coded up a Python script (I’m not a python developer, and Jelly doesn’t have NodeJS support at this moment), that clearly showed <a href="https://github.com/pietercolpaert/treeprofile-vs-jelly">an impressive increase in throughput</a>.
  The test is a bit simplistic, but even if you would add the complexity, the increase will remain substantial.
  Let’s make sure Linked Data Event Streams can work with Jelly.
</p>
<h3>P.S.</h3>
<blockquote class="mastodon-embed" data-embed-url="https://mastodon.social/@pietercolpaert/115153283911548667/embed" style="background: #FCF8FF; border-radius: 8px; border: 1px solid #C9C4DA; margin: 0; max-width: 540px; min-width: 270px; overflow: hidden; padding: 0;"> <a href="https://mastodon.social/@pietercolpaert/115153283911548667" target="_blank" style="align-items: center; color: #1C1A25; display: flex; flex-direction: column; font-family: system-ui, -apple-system, BlinkMacSystemFont, 'Segoe UI', Oxygen, Ubuntu, Cantarell, 'Fira Sans', 'Droid Sans', 'Helvetica Neue', Roboto, sans-serif; font-size: 14px; justify-content: center; letter-spacing: 0.25px; line-height: 20px; padding: 24px; text-decoration: none;"> <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" width="32" height="32" viewBox="0 0 79 75"><path d="M63 45.3v-20c0-4.1-1-7.3-3.2-9.7-2.1-2.4-5-3.7-8.5-3.7-4.1 0-7.2 1.6-9.3 4.7l-2 3.3-2-3.3c-2-3.1-5.1-4.7-9.2-4.7-3.5 0-6.4 1.3-8.6 3.7-2.1 2.4-3.1 5.6-3.1 9.7v20h8V25.9c0-4.1 1.7-6.2 5.2-6.2 3.8 0 5.8 2.5 5.8 7.4V37.7H44V27.1c0-4.9 1.9-7.4 5.8-7.4 3.5 0 5.2 2.1 5.2 6.2V45.3h8ZM74.7 16.6c.6 6 .1 15.7.1 17.3 0 .5-.1 4.8-.1 5.3-.7 11.5-8 16-15.6 17.5-.1 0-.2 0-.3 0-4.9 1-10 1.2-14.9 1.4-1.2 0-2.4 0-3.6 0-4.8 0-9.7-.6-14.4-1.7-.1 0-.1 0-.1 0s-.1 0-.1 0 0 .1 0 .1 0 0 0 0c.1 1.6.4 3.1 1 4.5.6 1.7 2.9 5.7 11.4 5.7 5 0 9.9-.6 14.8-1.7 0 0 0 0 0 0 .1 0 .1 0 .1 0 0 .1 0 .1 0 .1.1 0 .1 0 .1.1v5.6s0 .1-.1.1c0 0 0 0 0 .1-1.6 1.1-3.7 1.7-5.6 2.3-.8.3-1.6.5-2.4.7-7.5 1.7-15.4 1.3-22.7-1.2-6.8-2.4-13.8-8.2-15.5-15.2-.9-3.8-1.6-7.6-1.9-11.5-.6-5.8-.6-11.7-.8-17.5C3.9 24.5 4 20 4.9 16 6.7 7.9 14.1 2.2 22.3 1c1.4-.2 4.1-1 16.5-1h.1C51.4 0 56.7.8 58.1 1c8.4 1.2 15.5 7.5 16.6 15.6Z" fill="currentColor"/></svg> <div style="color: #787588; margin-top: 16px;">Post by @pietercolpaert@mastodon.social</div> <div style="font-weight: 500;">View on Mastodon</div> </a> </blockquote> <script data-allowed-prefixes="https://mastodon.social/" async src="https://mastodon.social/embed.js"></script>
<p>Next year, we’ll be the local host of SEMANTiCS and hope to welcome you all in Ghent!</p>
<p>
  Notes to self for organizing next year:
  <ul>
    <li>Make sure people’s names are on both sides of the name tag
    <li>Make sure the proceedings are available before the conference: not being able to go through the paper while attending talks breaks how some people, including myself, try to attend talks efficiently
    <li>Try to get a brass band during the conference dinner (Thanks Jean-Marc Acke for the suggestion)
  </ul>
</p>]]></content><author><name>Pieter Colpaert</name></author><category term="conferences" /><summary type="html"><![CDATA[I went to SEMANTiCS2025 with the team in Vienna. I will especially remember: the keynote of Hannah Bast, the fish bowl discussion, and the work on the Jelly serialization. Our team took on quite some responsibilities as well, organizing a bunch of workshops and a tutorial.]]></summary></entry><entry><title type="html">Four types of specification artefacts</title><link href="https://pietercolpaert.be/interoperability/2025/09/03/four-types-specification-artefacts.html" rel="alternate" type="text/html" title="Four types of specification artefacts" /><published>2025-09-03T00:00:00+00:00</published><updated>2025-09-03T00:00:00+00:00</updated><id>https://pietercolpaert.be/interoperability/2025/09/03/four-types-specification-artefacts</id><content type="html" xml:base="https://pietercolpaert.be/interoperability/2025/09/03/four-types-specification-artefacts.html"><![CDATA[<div class="teaser">
  <p>
    Interoperability isn’t about creating the one standard to rule them all.
    It’s about creating small reusable pieces of the puzzle.
    In the world of Linked Data we go in the right direction, but don’t go far enough, which still today leads to poor adoption despite containing the right ideas.
    I argue we need four types of reusable artefacts in specification:
    vocabularies, application profiles, interaction patterns, and implementation guides.
    Together, they form a toolbox for connecting systems across domains:
    implementation guides provide the much-needed answers developers need to build their systems, while interaction patterns provide composable functionality on top of the application profiles.
    Allow me to elaborate…
  </p>
</div>
<p>
  It’s not a secret that I’m a big fan of the approach Linked Data is taking when it comes to interoperability.
  The technology is based&mdash;as the name implies&mdash;around the idea of being able to link to each others’ concepts using global web identifiers, or simply web addresses (IRIs).
  For the domain models, this already means a split in types of specifications into <strong>vocabularies</strong> and <strong>application profiles</strong>.
  Application profiles are well-defined “schemas” that tell data providers what the shape is their data needs to adhere to.
  Yet, it’s the vocabularies that make the terms used in these profiles reusable in others, as various schemas can reference the same concepts, and those concepts in turn can be interlinked again in case a project would like to align in a later phase.
</p>
<p>
  It’s this kind of separation in types of specifications or artefacts of a consensus that made me wonder whether other types of specs should exist.
  Outside of the Linked Data world I often see specifications that try to cover everything by defining APIs and schemas in one go.
  This way, they skip the reuse of existing terms and patterns.
  However, Linked Data specifications often don’t cover enough.
  For example, if you want to build a metadata catalog, there is a W3C standard for that: the <a href="https://www.w3.org/TR/vocab-dcat-3/">DCAT vocabulary</a>.
  The European application profile&mdash;SEMIC’s <a href="https://interoperable-europe.ec.europa.eu/collection/semic-support-centre/dcat-ap">DCAT-AP</a>&mdash;then defines what EU organizations must support in order to work within aggregators.
  However, when you now would like to do something with a data portal, all options are left open.
  E.g., how do you take a copy of the data portal and stay in sync with it? What is the procedure to add a new dataset in the portal?
  There is no answer to be found in neither the DCAT vocabury or the European application profile.
</p>
<p>
  I was part of a SEMIC pilot on the question “how do I take a copy of a DCAT-compliant data catalog and stay in sync with it afterwards”.
  We had been working on a specification for an <strong>interaction pattern</strong> between clients and servers for replication and synchronization and thought we could apply it exactly to this domain.
  However, it remains quite abstract how to build this specifically for data catalogs.
  For that purpose, we’ve built an <strong>implementation guide</strong> called <a href="https://semiceu.github.io/LDES-DCAT-AP-feeds/index.html">DCAT-AP Feeds</a> in collaboration with <a href="https://www.digg.se/en">DIGG, the digitization agency in Sweden</a>.
  The implementation guide reuses the application profiles for DCAT-AP, as well as the Linked Data Event Streams interaction patterns for staying synchronized with event streams.
  While it did not add a lot of normative text, it brings all of this together in a specification that is ready to be implemented for aggregating data catalogs across Europe.
</p>
<p>
  How we approached DCAT-AP Feeds is also how I see this happening in other specifications.
  When these interaction patterns become <strong>composable</strong> into implementation guides, that’s when we’re going to see interesting reuse happen.
  Let’s go through the 4 types of specifications, and I’ll show what I think are good examples of this composability.
</p>
<figure>
  <img width="70%" style="margin: 0 auto; display: block;" src="/img/asset-types.svg" alt="Asset types diagram">
  <figcaption>We’ll introduce 4 specification artefacts: vocabularies, shapes, interaction patterns, and implementation guides (APIs/Connectors/dev docs/...). Maintaining them as separate artefacts ensures a better reusability.</figcaption>
</figure>
<h4>1. Vocabularies – agreeing on words</h4>
<p>
  Vocabularies are simply lists of web addresses with a certain meaning one can use.
  Their meaning is explained in full text.
  The DCAT vocabulary for example contains definitions for terms like Dataset, Catalog or DataService.
  You can decide whether you agree with the definition and reuse the term, or you can decide to be more specific, and instead create your own vocabulary.
  Vocabularies can contain classes and properties for a domain model you may want to instantiate, but equally as well it can contain code lists or a taxonomy.
  In more advanced projects, also complex relations between terms can be described in something we would start calling an ontology.
</p>
<p>
  It’s hardly possible to “comply” to a vocabulary, while this is of course something you would expect from a standard.
  You can reuse terms&mdash;or link up to terms&mdash;and this way make sure semantic interoperability problems will take less effort to solve.
  However, it is clear that this is only a first step towards solving interoperability.
  We will need more tools than just vocabularies&hellip;
</p>
<h5>Technologies</h5>
<ul>
  <li><strong>RDFS</strong> (RDF Schema&mdash;<i>although I wouldn’t call this a schema anymore today</i>) to describe classes and properties,</li>
  <li><strong>SKOS</strong> (The Simple Knowledge Organization System) to describe code lists and taxonomies,</li>
  <li><strong>OWL</strong> (The Web Ontology Language) to describe more complex relations between terms.</li>
</ul>
<p>
  As these are quite established RDF technologies, Large Language Models are particularly good at helping you to create those. Check out <a href="https://chat.mistral.ai/chat?q=I%E2%80%99m%20creating%20an%20RDF%20vocabulary%20for%20father%2C%20mother%2C%20person%2C%20kid%20and%20the%20relations%20between%20them.%20Generate%20a%20turtle%20code%20example%20using%20RDFS.">this example using Mistral.ai</a> (prompt: “I’m creating an RDF vocabulary for father, mother, person, kid and the relations between them. Generate a turtle code example using RDFS.”).</p>

<h5>Example: governmental vocabulary initiatives in Europe</h5>
<p>
  In Europe, it is common to maintain vocabularies at different levels: regional, national, and European.
  Each of these initiatives publishes terms that can be reused across projects, lowering the cost of integration.
  At the European level, SEMIC maintains the so-called <a href="https://op.europa.eu/en/web/eu-vocabularies/corevocs">Core Vocabularies</a> such as the Core Person Vocabulary and the Core Location Vocabulary.
  These are used in many application profiles, ensuring that a “person” or “address” is described in the same way across member states.
</p>
<p>
  In Flanders, we have <a href="https://data.vlaanderen.be">the Open Standards for Linking Organizations (OSLO) initiative</a>.
  OSLO publishes RDF vocabularies for concepts such as addresses, mobility, culture and many others, which are then applied in local data exchanges and reused by cities and regions.
  The OSLO-initiative is not in contradication with the European vocabularies: it reuses IRIs where there’s a perfect match, and it links up to broader terms where relevant.
  Other countries have similar efforts, such as Finland that maintains its own reusable vocabularies at <a href="https://finto.fi">finto.fi</a>.
  I believe every country should have an entrypoint to their vocabularies like this.
</p>
<h5>Example: ActivityStreams</h5>
<p>
  ActivityStreams is a vocabulary originally developed at the W3C to describe social web activities.
  It defines concepts such as “Person”, “Note”, “Like”, and “Follow”; but also more generic concepts such as a “Create”, “Update” and “Delete”.
  This vocabulary is reused in the <a href="https://www.w3.org/TR/activitypub/">ActivityPub protocol</a>, which powers federated social networks like Mastodon.
  Thanks to the vocabulary, a “Like” expressed in one system can be understood in another, even if the systems themselves were not built together.
  In the DCAT-AP Feeds implementation guide, we also re-use the semantics of a “Create”, “Update” and “Delete” from this vocabulary.
  It’s a great illustration of how vocabularies allow reuse across completely different applications.
</p>
<h4>2. Application profiles – agreeing on shapes</h4>
<p>
  We’ve established that a vocabulary alone is not enough.
  You also need to specify what shape of data an application actually expects.
  While RDF vocabularies have been around since the late nineties already, application profiles are a more recent development. 
  They gained traction when it became clear that applications also need to agree on which terms are required, which are optional, and how they fit together.
</p>
<p>
  Profiles make these expectations explicit so that producers and consumers know exactly what to exchange, and this can be <strong>validated</strong>.
  It is thus possible to <strong>comply</strong> to an application profile, yet this is still to be taken with a pinch of salt:
  what group of statements you decide to validate at which phase during a process is also important, yet this is not typically part of an application profile. For that, we will have to refer to the next section.
</p>
<h5>Technologies</h5>
<p>
  <strong>SHACL</strong> (the Shapes Constraints Language) describes shapes for RDF graphs and is widely used in European application profiles.</li>
</p>
<p>
  <strong>ShEx</strong> is an alternative shape language with a more compact syntax. I haven’t personally used it, but have encountered it in health care use cases.</li>
</p>
<p>Both technologies are being discussed in the free online book “<a href="">Validating RDF</a>”.</p>
<h5>Example: governmental initiatives</h5>
<p>
  At the European level, SEMIC and the Publications Office of the EU maintain the registry of official application profiles such as <a href="https://op.europa.eu/en/web/eu-vocabularies/dcat-ap">DCAT-AP for EU data portals</a>, or <a href="https://ec.europa.eu/isa2/solutions/core-public-service-vocabulary-application-profile-cpsv-ap_en/">CPSV-AP for describing public services</a>. These profiles shape how vocabularies are applied for specific use cases.
  In contrast to the DCAT vocabulary, you can <strong>validate</strong> the DCAT-AP application profile, and thus <strong>comply</strong> to the specification.
  The European Commission <a href="https://data.europa.eu/mqa/shacl-validator-ui/data-provision">makes a validator available for the current and previous versions of DCAT-AP</a>.
  If your data input does not validate there, it is not going to make it on <a href="https://data.europa.eu">data.europa.eu</a>.
</p>
<p>
  In <a href="https://data.vlaanderen.be">Flanders</a>, the vocabularies initiative also comes with application profiles, and SHACL artefacts, that guide local governments in publishing interoperable datasets. 
  Other countries host national schema catalogs. In France, <a href="https://schema.data.gouv.fr/">schema.data.gouv.fr</a> serves as a central registry for public-sector data schemas, allowing producers to discover, document, and align their data models to commonly used formats.
  Similarly, <a href="https://schema.gov.it/">Italy</a> and other EU member states have their own initiatives to document and publish application-specific schemas, supporting interoperability at the national level.
  Also non-Linked Data schemas can be found there that structure CSV-files or validate JSON structures with JSON Schemas.
  While these initiatives have their merit, I believe we should be unambiguous about whether certain terms follow the same semantics or not.
  The decoupling of application profiles and vocabularies is an important one.
</p>
<h5>Example: the dataspace protocol</h5>
<p>
  In the world of dataspaces, the <a href="https://eclipse-dataspace-protocol-base.github.io/DataspaceProtocol/">Dataspace Protocol</a> relies on JSON-LD combined with JSON Schema validation to ensure that contracts and dataset descriptions can be interpreted consistently across participants.
  This is a very pragmatic approach to bridge the gap between <a href="https://pietercolpaert.be/interoperability/2025/08/22/levels-of-ambition.html">interoperability within an app ecosystem and cross-app interoperability</a>.
  Instead of SHACL or ShEx, a serialization specific validation method is used.
  The drawback is that the protocol is now unnecessarily coupled to JSON, and that it also standardizes the structure of the JSON document instead of the shape of the graph.
  The benefit is that this looks very familiar to developers who have worked on client-server implementations in the past.
  All in all, these are just implementation choices that reach a similar goal.
</p>

<h5>Example: The European Union Agency for Railways (ERA)</h5>
<p>
  In the railway sector, ERA <a href="https://gitlab.com/era-europa-eu/public/interoperable-data-programme/era-ontology/era-ontology/-/tree/main/era-shacl">has published SHACL shapes</a> that specify how railway entities such as stations and tracks must be described. 
  National railway companies may have richer internal models, but when they want to interoperate at the European level, these shapes define the minimum requirements.
  This way, a European Railway Infrastructure (RINF) Knowledge Graph can be created, which today is a fundamental tool for the railway sector to understand what vehicles will be compatible with a certain railway route across the borders of EU member states.
</p>
<h4>3. Interaction patterns – agreeing on flows</h4>
<p>
  Interoperability is not just about validating data or reusing vocabulary terms, but also about how systems or agents interact with each other.
  These interactions can be low-level, such as how to exchange messages over HTTP, or high-level, such as describing organizational procedures.
</p>
<p>
  A story without technology I like to tell is how you change your first name.
  You cannot take your identity card, scratch the name and replace it with another using a permanent marker.
  Instead, you need to follow an official procedure, usually defined by your municipality.
  For example, here is <a href="https://www.brussels.be/change-first-name">the information page from Brussels</a> documenting the steps you need to take.
  Once you’ve completed the process, you are issued a new identity card, and other systems—such as the population register—are updated as well.
</p>
<figure>
  <img src="/img/identity-card.png"></img>
  <figcaption>Not the right way to change your name, yet this is often how we implement our web services today.</figcaption>
</figure>
<p>
  On the Web we often forget this procedural layer.
  We simply overwrite data, re-upload a dump, or push an updated knowledge graph, without any guarantee that the right process was followed.
  By making interaction patterns explicit, we can attach <strong>trust</strong> and compliance levels: when a system proves that a certain procedure was followed, consumers can rely on it.
</p>
<p>
  <strong>Interaction patterns</strong> are reusable flows, comparable to state machine or flowcharts, to achieve a certain goal.
  They define not only the messages exchanged, but also the order in which they happen and the conditions under which they succeed.
  In order to do so, they can reuse terms from vocabularies and can define the shape of the data they expect in application profiles.
</p></p>
  We can already see them in action across domains, from liking someone’s post on social media, to data synchronization, to contract negotiation in dataspaces.
  Such patterns should be <strong>composable</strong>: I might want to synchronize a dataset, but also tell someone I “liked” the dataset it after I negotiated access to the dataset.
</p>
<h5>Technologies</h5>
<ul>
  <li><strong>Developer documentation</strong> explaining in a higher level fashion how the process works using state machines, flowcharts or sequence diagrams, or lower-level documentation such as HTTP protocol bindings,</li>
  <li><strong>Hypermedia controls</strong> to describe the next possible steps in an interaction,</li>
  <li><strong>Rule languages</strong>, such as <a href="https://notation3.org/">Notation3</a> or <a href="https://www.oxfordsemantic.tech/faqs/what-is-datalog">datalog</a>, to formalize and automate such state transitions,</li>
  <li><strong>Procedure extensions to CPSV-AP</strong> or other workflow notations for higher-level organizational processes.</li>
</ul>
<p>
  While the previous chapters already have established tooling within and outside of the linked data domain, convergence on technologies to achieve interaction patterns is still on-going.
  The majority of specs that I find good examples of interaction patterns today are written as developer documentation.
  Developer (or LLM?) documentation means that still those patterns will be hard-coded.
  It would be nice if we could build an abstraction for those patterns, so that engines can automatically understand the interaction pattern and do not need to have their code adapted (this is what I called <a href="https://pietercolpaert.be/interoperability/2025/08/22/levels-of-ambition">ambition level 3: cross-engine interoperability</a> in my previous post).
</p><p>
  There is however no consensus yet on this abstraction layer, and I also doubt whether there will ever be one definitive one.
  The idea of hypermedia controls in APIs have not really reached the adoption one would have imagined, as they were positioned as part of one of the constraints in the REST architectural style by Fielding in the early 2000s.
  However, I still believe this is the way to go: when fetching a page, you should also get the descriptions&mdash;so-called hypermedia controls&mdash;of where you can go from here.
  Various Linked Data initiatives adopted this idea.
  E.g., the Linked Data Platform (LDP) is a <a href="https://www.w3.org/ns/ldp">vocabulary</a>, <a href="https://www.w3.org/TR/ldp/">application profile and set of interaction patterns with HTTP protocol bindings</a> for read-write Linked Data information resources.
  When you implement these interaction patterns, a client will be able to understand how to read the contents of elements in a possibly paginated container, and how to change their representations.
  LDP is then again adopted <a href="https://solidproject.org/TR/">by the Solid project for building personal data vaults</a>, that takes a subset of the interaction patterns within LDP, and extends it with access control and user profiles (WebID).
  Other specifications like <a href="https://www.hydra-cg.com/">Hydra</a>, <a href="https://w3id.org/tree/specification">TREE</a>, <a href="https://www.w3.org/TR/wot-thing-description11/">Web of Things</a>, or <a href="https://www.w3.org/TR/activitystreams-core/#collections">ActivityStreams collections</a> also adopted hypermedia at the heart of their interaction patterns.
</p>
<p>
  CPSV-AP is an application profile to describe public services in Europe.
  It would be nice if CPSV-AP would be extended to also contain a description of the procedures that are otherwise just described in full text (cfr. the information age to change your first name).
  This was an experiment <a href="https://biblio.ugent.be/publication/01GQEXWF67PEW73F0JHEPYG5VT">already back in 2021 in Flanders with OSLO-steps</a>.
</p>
<h5>Example: ActivityPub</h5>
<p>
  In ActivityPub, the protocol behind Mastodon, interaction patterns are at the core.
  <a href="https://www.w3.org/wiki/ActivityPub/Primer/Like_activity">When you “like” a post</a>, there’s a defined flow: your server creates a “Like” activity, delivers it to the author’s server, and that server then updates its counters. 
  The same goes for following someone, posting, or resharing content. 
  These flows are reusable: any implementation that supports the ActivityPub protocol understands what a “Like” or “Follow” means, even if it was initially published in entirely different communities.
</p>

<h5>Example: synchronization with LDES</h5>
<p>
  Another example is <a href="https://w3id.org/ldes/specification">Linked Data Event Streams (LDES)</a>. 
  Here, the interaction pattern defines how clients can replicate a dataset and stay in sync with updates over time. 
  Whether the source is a cultural heritage collection, traffic sensor data, or a national data portal, the replication flow remains the same: fetch the most recent view, then follow links to receive incremental updates. 
  Only the vocabulary and application profile differ, which makes the replication pattern composable and reusable across domains.
</p>
<p>
  LDES itself is also a good example of this separation of specs.
  It consists of a vocabulary with application profiles for validating the pages, as well as the interaction patterns.
  The spec itself is written from a consumer-perspective for that reason.
  The application profile of LDES itself also reuses the TREE hypermedia vocabulary, making sure to reuse semantics where it makes sense.
</p>

<h5>Example: contract negotiation in the Dataspace Protocol</h5>
<p>
  In dataspaces, data exchange usually requires a contract that specifies terms of use. 
  The <a href="https://eclipse-dataspace-protocol-base.github.io/DataspaceProtocol/">Dataspace Protocol</a> therefore defines an interaction pattern for contract negotiation. 
  It specifies the sequence of messages (offer, counter-offer, agreement) as well as their bindings to HTTP. 
  Participants in a dataspace can thus automate negotiations while still reusing existing vocabularies such as DCAT for dataset descriptions or ODRL for usage control policies. 
  This is a prime example of combining vocabularies, application profiles, and interaction patterns into a coherent whole.
</p>

<h5>Example: evaluating ODRL policies with N3 rules (FORCE)</h5>
<p>
  ODRL gives us a shared language for usage control, but its evaluation semantics are still underspecified—different engines can interpret the same policy differently. 
  The <a href="https://w3id.org/force"><em>Framework for ODRL Rule Compliance through Evaluation</em> (FORCE) tackles</a> this by defining a repeatable interaction pattern for policy evaluation and by shipping a tested evaluator plus a common report model.
  Instead of having to hard-code the rules for evaluating such ODRL policies, <a href="https://openreview.net/pdf?id=IcZ0C8zd4B">they are described in Notation 3 (N3)</a>.
  An engine runs those <strong>N3 rules</strong> to decide which permissions/obligations/prohibitions are active and returns a machine- and human-readable compliance report. 
</p>

<h4>4. Implementation guides – agreeing on practice</h4>
<p>
  Even with vocabularies, profiles, and interaction patterns, developers still need clear instructions to follow when implementing a specific use case. 
  That’s where <strong>implementation guides</strong> come in: they combine all artefacts into an end-to-end recipe, lowering the entry barrier for developers.
</p>
<p>
  For example, in <a href="https://github.com/SEMICeu/LDES-implementation-reports">the SEMIC pilots on LDES</a>, the implementation guides walk implementers through how to publish a dataset as a stream. 
  Instead of just defining vocabularies and patterns in the abstract, the guide gives concrete examples, step-by-step instructions, and reference implementations. 
  This made it possible for multiple domains to reuse the same replication pattern with only small adjustments.
</p>
<p>
  Another strong example are the <a href="https://oots.pages.code.europa.eu/tdd/apidoc/">Once-Only Technical System (OOTS)</a> specifications. 
  They provide the API descriptions for governmental procedures documented in the Single Digital Gateway Regulation (SDGR), such as how a citizen can change their address across borders. 
  The guides describe the flow end-to-end: which vocabularies to use (e.g. Core Person, Core Location), which application profiles to validate (e.g. CPSV-AP), and which interaction patterns to follow (e.g. verifiable credentials). 
  They could go further in this vision, but already show the power of an implementation guide as a binding document between law, policy, and technology.
</p>
<p>
  Implementation guides complete the picture: they are the glue that ensures vocabularies, application profiles, and interaction patterns move from paper into running code. 
  Without them, interoperability risks staying theoretical. With them, it becomes practice.
</p>
<h3>P.S.</h3>
<p>
  Seeing interoperability through these four artefacts helps avoid both extremes: the chaos of everyone doing their own thing, and the rigidity of forcing one grand standard. 
  Instead, we can identify what already exists at each level, reuse it, and only invent what’s missing. 
  This perspective also makes it easier to carry lessons across domains: a museum and a mobility operator may not share vocabularies, but they can certainly reuse the same interaction patterns or learn from each other’s implementation guides.
</p>
<p>
  With this vision, I don’t believe we should build software that is domain-specific anymore. 
  No domain is so unique that it requires domain specific data pipelines. 
  There will always be opportunities to maximize the reuse of interaction patterns.
  If a suitable one does not yet exist, we can define it in such a way that others can reuse it too. 
  That is how we move from isolated solutions to an ecosystem of reusable building blocks.
</p>
</div>]]></content><author><name>Pieter Colpaert</name></author><category term="interoperability" /><summary type="html"><![CDATA[Interoperability isn’t about creating the one standard to rule them all. It’s about reusable artefacts. I argue there are four types: vocabularies, application profiles, interaction patterns, and implementation guides. Together, they form a toolbox for connecting systems across domains.]]></summary></entry><entry><title type="html">Interoperability ambition levels</title><link href="https://pietercolpaert.be/interoperability/2025/08/22/levels-of-ambition.html" rel="alternate" type="text/html" title="Interoperability ambition levels" /><published>2025-08-22T00:00:00+00:00</published><updated>2025-08-22T00:00:00+00:00</updated><id>https://pietercolpaert.be/interoperability/2025/08/22/levels-of-ambition</id><content type="html" xml:base="https://pietercolpaert.be/interoperability/2025/08/22/levels-of-ambition.html"><![CDATA[<div>
  <div class="teaser">
    <p>
      Well-intentioned initiatives to improve interoperability often run into the same problem: people have very different expectations of what “making services interoperable” actually means.
      If the goal is simply to connect two systems, the task is relatively straightforward: implement the connection and hard-code the alignments. That’s been common practice for years, and even a solid business model for many integrators.
      Today, generative AI can accelerate this process, but it will never make it truly <i>scalable</i> or <i>reliably correct</i>.
      What we need instead is a higher level of ambition from data providers. Rather than assuming one-off manual integrations, providers can prepare their systems to be ready for integration with many others.
      That’s the real quest toward interoperability, and it requires us to think in terms of different levels of ambition.
     </p>
  </div>
  <p>
    The four layers of interoperability in <a href="https://ec.europa.eu/isa2/sites/default/files/eif_brochure_final.pdf">the European Interoperability Framework (EIF3, 2018)</a>&mdash;legal, organizational, semantic, and technical&mdash;have been instrumental as a guiding mechanism.
    It highlighted that interoperability is multi-faceted: it cannot be solved by software engineers alone.
    No, it is to be solved in an integrated way, solving legal concerns with a legal team, solving organisational aspects using managers, solving technical hurdles with software engineers and making sure everyone has the same understanding for the terms used.
    It introduced a new level of abstraction on top of how public administrations (who are the main target of the EIF) would look at making services run across the border of their own organization.
    Interoperability becomes a governance concern you need to think about in advance instead of in hindsight.
  </p>
  <figure>
    <img src="/img/eif.png"/>
    <figcaption>The <a href="https://ec.europa.eu/isa2/sites/default/files/eif_brochure_final.pdf">European Interoperability Framework (EIF3)</a> introduced a multi-faceted way of thinking about interoperability.</figcaption>
  </figure>
  <p>
    Today this kind of layering has become well-understood.
    However, not all interoperability challenges are now solved as we’re still taking the first steps into automating integrations, so they are cost-effective, reliable and scalable.
    Next to these 4 layers, it’s time to take the next step in the complexities of interoperability governance and <strong>look at where we are today, where we should be heading tomorrow, and what our ambition will be in the long term</strong>.
  </p><p>
    A simple example of where we are today is the <a href="https://gbfs.org/">General Bikeshare Feed Specification (GBFS)</a>.
    GBFS is a set of JSON schemas that makes it easier to discover and use shared mobility modes.
    It is undeniable that this specification has had a positive effect on interoperability for apps to integrate the availability of shared mobility in a region.
    However, the use cases on which interoperability was created remain limited to exactly those clients that specifically coded against the GBFS schemas.
    There’s no reuse of semantics or specific interaction patterns that would help non-GBFS clients to still perform a task on the data, for example, to show it on a map, or to study the data as time series, or to show opening hours of specific services.
  </p>
  <p>
    For that kind of <strong>cross-application reuse</strong> to become possible, we need to raise our ambitions.  
    Instead of each ecosystem inventing its own schema and API, we need a way to separate concerns more clearly: vocabularies that define shared terms, application profiles that define how those terms are used in a context, and interaction patterns that describe the workflows or exchanges between systems.  
    Add to that global identifiers that work across domains, and you get the foundation for reuse.
    This is where interoperability stops being about “connecting system A to system B” and becomes about building common building blocks that any system can adopt. 
  </p>
  <p>
    One separation we know well in the world of Linked Data is between <strong>vocabularies</strong> and <strong>application profiles</strong>.  
    Vocabularies provide the global identifiers for domain specific terms.  
    Application profiles, on the other hand, assemble these terms from one or more vocabularies into a schema that a particular system or service expects.  
    For example, an application can use the property <a href='http://www.opengis.net/ont/geosparql#asWKT'><code>geo:asWKT</code></a> from the OGC GeoSPARQL vocabulary to indicate that something has geospatial coordinates.  
    This pattern can then be reused by many other systems, even if they are working in different domains.  
    Not everyone will choose the same property to describe geospatial data, but that’s where <strong>alignments</strong> come in.
    The global identifiers can be reused to provide a mapping between patterns.
  </p>
  <p>
    A publication, that so far did not receive the attention it deserves, elaborating on this exact challenge, is <a href="https://ruben.verborgh.org/articles/web-api-ecosystem/">“A Web API ecosystem through feature-based reuse”</a> by <a href="https://www.maastrichtuniversity.nl/mj-dumontier">prof. Michel Dumontier</a> (known from his work on the <a href="https://www.go-fair.org/fair-principles/">FAIR principles</a>) and <a href="https://ruben.verborgh.org">prof. Ruben Verborgh</a>.  
    APIs were originally intended to make automated connections easier, but in practice they often add complexity.  
    Each new API introduces its own contract, forcing developers to write a dedicated client, which leads to one client per API.  
    The paper argues for a shift to <strong>feature-based reuse</strong>: instead of treating every API as a silo, describe the features it provides and reuse features that others have already documented.  
    This way, a client only needs to implement a set of reusable features once, and it will work across multiple APIs.  
    It creates a looser, more flexible contract between APIs and clients, which is far more scalable than today’s approach.  
  </p>
  <p>
    Think of common API features such as pagination, filtering, synchronisation, or contract negotiation.  
    Instead of every ecosystem inventing its own way of doing these, we can standardise the patterns once and reuse them everywhere.  
    For example, the <a href="https://w3id.org/tree/specification">TREE hypermedia specification</a> provides a reusable interaction pattern for pagination and subset discovery.  
    The <a href="https://w3id.org/ldes/specification">SEMIC Linked Data Event Streams (LDES) specification</a> builds on that to describe how a client can stay in sync with a changing dataset.
    For contract negotiation, there is the <a href="https://eclipse-dataspace-protocol-base.github.io/DataspaceProtocol/2025-1-RC1/#state-machine">Dataspace protocol</a> that defines the interaction patterns as a state machine.
    These are concrete, reusable building blocks that any domain can adopt, whether you’re in the domain of cultural heritage, traffic measurements, or building public services.
    Domain specifications will this way become smaller, and be a combination of reusable components rather than a reinvention of the same patterns.
  </p>
  <figure>
    <img src="https://www.w3.org/DesignIssues/diagrams/SemWave.png"/>
    <figcaption>The Semantic Web Wave, <a href="https://www.w3.org/2003/Talks/01-siia-tbl/slide19-0.html">as presented by prof. Tim Berners-Lee in a 2003 talk</a>. Back then, XML would be for interoperability within an app ecosystem, RDF would be for cross-app interoperability, and logic would be for inter-engine interoperability.</figcaption>
  </figure>
  <p>
    In a 2003 presentation, prof. Tim Berners-Lee pitched the “Semantic Web Wave” (see above).
    The idea was that there is a gradual approach to interoperability, where systems that already put in the effort to have interoperability within an ecosystem through tightly coupled specifications (interoperability within an app ecosystem), can evolve towards reusing Linked Data vocabularies and RDF serializations (cross-app interoperability).
    Once they do, they can take the next step: documenting and sharing the logic so machines can interpret the data across implementations.
    This last step goes beyond cross-app interoperability by insinuating that we won’t need to build any domain specific software anymore.
    We will be able to ship a domain agnostic engine, together with the data.
    The engine will be able to, through procedures and logical rules described in the data, understand how to interact with various systems for you.
  </p>
    <!--
  <p>
    Elaborate on examples:
     - Automating semantic alignment via ontological concepts. E.g., traffic measurements data space.
     - For example: a base registry of addresses across the EU...
  </p>-->
  <h3>The three ambition levels for interoperability</h3>
  <p>
    All good things come in threes.
    Whenever positioning conceptual levels, I believe it is to convey where we are today, what we need to reach for in the short term and what the heading is in the long term.
    Level 1 is what is already being adopted, level 2 is what is within reach, but often not yet available off the shelf, and level 3 should be a future outlook we can prepare for today.
    Also the Semantic Web Wave by Tim Berners-Lee grasped the idea that you start with systems that only work within one tightly coupled ecosystem, then move towards shared vocabularies and patterns for interoperability cross ecosystems, and finally, you reach a point where systems can even share the logic for interpreting the data.
    Let’s reiterate and modernise those levels for today’s use cases.
  </p> 
  <p>
    The three levels I propose follow the same idea and naming:
  </p>
  <ul>
    <li><h4>Level 1 — Interoperability within an app ecosystem</h4>
      <p>This is where most initiatives start. A group of actors in the same domain agrees on a format and a protocol. Developers can read the spec, build their implementation, and everything works smoothly — as long as you stay inside that ecosystem.
      </p><p>Think of the General Bikeshare Feed Specification (GBFS) for bike-sharing data, GeoJSON for geospatial points, MARC 21 in libraries, HL7 in healthcare, or NeTEx in public transport. Each technology has an undeniable impact within its own ecosystem, but if you want to use that data in another domain or platform, you’ll need to write extra code to bridge the gap.</p>

    <li><h4>Level 2 — Cross-ecosystem interoperability</h4>
    <p>Here, you go beyond your own domain and design with reuse in mind. Identifiers are global rather than local. Vocabularies, application profiles, and interaction patterns are separated so they can be mixed and matched.
    This makes it possible for cultural heritage data to be aggregated with the same interaction patterns as a transport API, or for healthcare systems to reuse the same contract negotiation protocol as an industrial dataspace. It’s where RDF, SHACL, and hypermedia APIs start to appear as they make this kind of reuse possible.</p>

    <li><h4>Level 3 — Inter-engine interoperability</h4>
    <!-- TODO: write this as the next future leap for which we can prepare today -- Cfr. the browser for services -->
    <p>The final step is when you also include the logic in your specifications, so a machine can interpret, verify and interact with your data and systems without being built for your specific domain.
      This is where AI and rule engines meet.</p>
  </ul>
  <p>
    Today, most projects stop at Level 1 because it’s faster to implement and delivers value quickly.
    Working our way upwards, we need to be able to show evidence that indeed, by making the abstractions necessary for Level 2 and even Level 3, will save a lot of work later.
  </p>
  <!--
  <h3>Couple of more examples</h3>
  <p>Pragmatic approaches: mixing JSON-LD with JSON Schemas to bridge the gap between level 1 and level 2: this makes sense!</p>
  <p>ODRL for trust</p>
  <p>Traffic measurements</p>
  -->
  <figure>
    <img src="/img/levels.svg"></img>
    <figcaption>The 3 ambition levels for interoperability summarized</figcaption>
  </figure>
  <h3>P.S.</h3>
  <p>
    Are you interested in learning about interoperable dataspaces technology?
    At this moment the registrations are open for the course on <a href="https://www.ugain.ugent.be/linkeddatasolid2025.htm">Linked Data, Solid and interoperable dataspaces</a>. The course starts in September and runs until January.</p>
  <p>Thank you Jos De Roo for telling me about this initial vision of Tim Berners-Lee at the coffee machine. I decided to adopt these original terms for interoperability ambitions as coined by him.</p>
  <p>I am at this moment part of the expert team on the European Interoperability Framework (EIF). In this context I’m planning a couple of blog posts, for which this one is the first. These posts however do not reflect the position of the EIF.</p>
</div>]]></content><author><name>Pieter Colpaert</name></author><category term="interoperability" /><summary type="html"><![CDATA[Interoperability isn’t just about connecting two systems—it’s about building reusable data and interaction patterns that work across domains. This post explores three ambition levels, from ecosystem-specific integration to cross-domain reuse and AI-powered inter-engine communication, inspired by the European Interoperability Framework and the Semantic Web.]]></summary></entry><entry><title type="html">Escaping the false dichotomy of API vs. data dump with Linked Data Event Streams</title><link href="https://pietercolpaert.be/ldes/2021/09/03/ldes.html" rel="alternate" type="text/html" title="Escaping the false dichotomy of API vs. data dump with Linked Data Event Streams" /><published>2021-09-03T00:00:00+00:00</published><updated>2021-09-03T00:00:00+00:00</updated><id>https://pietercolpaert.be/ldes/2021/09/03/ldes</id><content type="html" xml:base="https://pietercolpaert.be/ldes/2021/09/03/ldes.html"><![CDATA[<div>
  <div class="teaser">
    <p>
      Data publishers often face a familiar but painful choice: should they provide a full data dump or a querying API?
      Each option comes with a hidden cost.
      The data dump leads to replication hell—multiple copies scattered across consumers, each potentially outdated.
      A querying API leads to maintenance hell—constant effort to keep endpoints up-to-date, scalable, and reliable.
      With Linked Data Event Streams (LDES), we provide a way out...
    </p>
  </div>
  <p>
    Data publishers are too often asked to pick between two unsatisfying options.
    The data dump looks harmless: publish a full export and let consumers do their thing.
    In practice it creates a <strong>replication hell</strong>: multiple uncoordinated copies drift out of sync; consumers juggle deltas and snapshots; provenance cannot be traced; history gets lost in overwrites; and every “fresh” download quietly rebuilds the same indexes in a hundred places.
  </p>
  <p>
    Take for example the address registry in Flanders for which <a href="https://basisregisters.vlaanderen.be/producten/grar#downloadbestandgrar">data dumps are available</a>.
    Probably every municipality in Flanders takes a copy of this file for use cases such as autocompleting the street names for the forms they use everywhere.
    When developers make such integrations, sychronization is an afterthough: this dataset doesn’t change that often anyway, right?
    Think again: there are minor changes every day, with new addresses coming into play and old ones becoming “historized”.
    When in <a href="https://en.wikipedia.org/wiki/Fusion_of_the_Belgian_municipalities">2016, 2019 and 2025, cities in Belgium decided to fuse</a>, a lot of street names also needed to be renamed to avoid duplicate names.
    Instead of <a href="https://interoperable-europe.ec.europa.eu/collection/semic-support-centre/base-registries">this base registry</a> being updated from the source, services started making the changes they needed manually in their local copies, leading to a replication hell. 
  </p>
  <p>
    On the other side sits the querying API.
    It promises precision—ask only for what you need—while quietly enrolling the publisher in <strong>maintenance hell</strong>.
    New use case? New endpoint.
    New query language that became popular? Again yet another API to be provided.
    The provider becomes an involuntary platform operator, while consumers still only have a limited processing possibility of the dataset as they can only query the API in the ways the data provider managed to set up.
    Each endpoint that has been set up comes with its own maintenance cost.
    After a while, when priorities shift or budgets shrink, it will have become impossible to turn off any of the existing APIs as there may still be an application that relies on it.
    The budget that once was used for innovation and creating a better public service, is now being used for maintaining legacy APIs.
  </p>
  <p>
    Take again the example of the address registry in Flanders for which next to data dumps, <a href="https://web.archive.org/web/20250823085609/https://vlaamseoverheid.atlassian.net/wiki/spaces/AGB/pages/6099766021/Raadpleegdiensten">also a plethora of API products are available</a>.
    Specific functionalities that were once brought online, need to remain maintained: there may always be that one service that is still relying on this API.
    Certainly for address registries, multiple functionalities of the dataset are expected, such as: finding the geolocation of an (or multiple) address(es), a geospatial interface to visualize the data, a historic view of what addresses existed in the past until today, an autocompletion interface for autocompleting streetnames, municipalities and addresses, a SPARQL, GQL and GraphQL API for graph-based access, a specific service that allows to calculate what addresses will be impacted by a road closure, etc.
    It doesn’t matter how many APIs you have: it will never be sufficient—there’s always going to be that other person that needs a functionality that does not yet exist.
  </p>
  <p>
    There are however other paths. Let’s start from the idea of taking full copies cfr. “dumps”, but let’s change the intent and the name. Let’s call it a stream. This way, it sets the expectation that developers of consumption pipelines will code for history and future: they will re-interpret what happened and with exactly the same code stay in sync with what will happen.
    This is the mindset behind <a href="https://ldes.tech">Linked Data Event Streams (LDES)</a>.
    The <a href="https://pietercolpaert.be/interoperability/2025/08/22/levels-of-ambition">ambition level (update: see a 2025 blog post)</a> is to introduce semantic interoperability through Linked Data, and combine this idea with how developers interact with streams.
    LDES publishes the dataset as an append-only sequence of immutable members with stable identifiers, so any party can replicate once and then follow updates.
    Hence the straightforward name: the combination of having this ambition towards interoperability, and the ambition to always keep every copy up to date, becomes Linked Data + Event Streams: LDES in short.
  </p>
  <figure>
    <img src="https://tree.linkeddatafragments.org/img/logo-ldes.svg" width="100%"></img>
    <figcaption>The LDES logo</figcaption>
  </figure>
  <p>
    This shift unlocks governance opportunities.
    With an authoritative event source online, the publisher can decide which higher-level interfaces to keep maintaining, and which to let the ecosystem carry.
    A SPARQL endpoint, an OGC API, or a GraphQL service may be useful today and optional tomorrow.
    If a GraphQL API stops aligning with the publisher’s priorities, the publisher can bring it offline, while the consumer that still needs it can spin up their own GraphQL server that replicates and synchronises from the event source, preserving functionality without forcing the publisher to keep every interface alive forever.
    If you maintain a base registry or any dataset that changes over time, start by publishing the LDES at the event source. Everything else can—and should—derive from there.
  </p>
  <figure>
    <img src="/img/eventual-iop.svg" width="100%"></img>
    <figcaption>With an authoritative event source online, the publisher can decide which higher-level interfaces to keep maintaining, and which to let the ecosystem carry.</figcaption>
  </figure>
  <p>
    For the technical details, see the LDES specification at <a href="https://w3id.org/ldes/specification">https://w3id.org/ldes/specification</a>.
    Various implementations of clients and servers are available. In order to interact with the community, visit <a href="https://ldes.tech">https://ldes.tech</a>.
  </p>
  <h3>The talk at ENDORSE 2021</h3>
  <p>
    At ENDORSE 2021 there was a talk in which I explain the contents of this blog post in a talk:
  </p>
  <iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/89UVTahjCvo?start=1096" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen></iframe>
  <h3>P.S.</h3>
  <p>This post was planned to be publish in 2021, but remained in draft until 2025. I have only in 2025 taken the time to finalize it but have kept the initial planned publication time.</p>
</div>]]></content><author><name>Pieter Colpaert</name></author><category term="ldes" /><summary type="html"><![CDATA[Data publishers often face a familiar but painful choice — should they provide a full data dump or a querying API? With Linked Data Event Streams (LDES), we provide a way out.]]></summary></entry><entry><title type="html">What we did in 2019 and will be doing in 2020</title><link href="https://pietercolpaert.be/research/2019/12/31/hindsight.html" rel="alternate" type="text/html" title="What we did in 2019 and will be doing in 2020" /><published>2019-12-31T00:00:00+00:00</published><updated>2019-12-31T00:00:00+00:00</updated><id>https://pietercolpaert.be/research/2019/12/31/hindsight</id><content type="html" xml:base="https://pietercolpaert.be/research/2019/12/31/hindsight.html"><![CDATA[<blockquote>
  <p>It’s not research if you’re not learning in hindsight</p>
  <cite>I’m sure someone must have said it at some point</cite>
</blockquote>
<div class="teaser">
  <p>
    2019 was the second year I was a postdoctoral researcher at <a href="http://idlab.technology">IDLab</a>.
    In this blog post I want to reflect on the <em>research goals</em> I set one year ago, but also what we are going to do in 2020.
    It is curious to see that after all these years, I still underestimate certain steps, while I overestimate others.
    Will we be able to the better at the end of 2020?
    The blog post from one year ago is available here: <a href="https://pietercolpaert.be/research/2018/12/30/automating-reuse.html">our research in 2019</a>”.
  </p>
</div>
<div id="introduction">
  <p>
    Our research focus was and will remain <em>designing Public Web APIs</em>.
    Last year, I put forward our main research approach for read-only Web APIs as: “How do you fragments datasets bigger than 50kb?”.
    Taking the fragmentation approach on a dataset helps to re-think and re-shape APIs for Open Datasets, yet putting forward an ideal size is certainly an oversimplification that we should not overuse.
    The ideal size of a page depends on so many factors: update frequency of the data in the page, whatfor the data itself is used, how compact the data can be represented, how the data is requested by query engines, the compression rate, type of compression, cacheability, etc.
    Nevertheless, for most use cases, 50kb after compression held as a good initial guess.
  </p>
  <p>
    Thinking about dataset publishing as merely a fragmentation problem, helps a lot nevertheless.
    I’ve started coining the idea of “<em>the Web as a hard disk</em>” to explain that no database expert in their right mind would suggest removing the <a href="https://en.wikipedia.org/wiki/Page_cache">page cache</a> from an operating system.
    It is this cache that is the enabler of the scalability of hard disk drives, powered by the <a href="https://en.wikipedia.org/wiki/Locality_of_reference">locality of reference principle</a>.
    If we could use existing caches that are already in everyone’s pocket, HTTP browser caches, then we could make the web of data much more efficient as well.
    The kind of fragmentation will lower the amount of fragments that need to be downloaded for a specific use case, but might not for the other.
    We recommend to always work with real query logs from an existing API in order to prove a point.
  </p>
  <p>
    Designing public Web APIs is not limited to just fragmenting.
    Quickly you notice also other aspects come into play, that again make hosting more expensive:
    supporting different serializations, allowing to request a version of the page from the archive, materializing data dumps for manual inspection, doing metadata well for both dataset discovery (<abbr>dcat</abbr>) as interface discovery (<abbr>hydra</abbr>) and provenance.
    In non fragment-based interfaces we have however not even started to think about these problems.
  </p>
</div>
<div id="insights">
  <h2>Insights from 2019</h2>
  <p>
    In the <a href="https://smart.flanders.be">Smart Flanders programme</a>, we outlined technical principles that data publishers should adhere to.
    The technical principles include adding a license to your dataset, enabling Cross Origin Resource Sharing, using JSON-LD over plain JSON, using the Flemish OSLO domain models, etc.
    We have been working three full years on getting these principles accepted at local governments, working on how this translates into paragraphs to be put in tendering documents.
    For the next years, it will be a challenge to translate these principles into architecture diagrams.
  </p>
  <p>
    Different use cases were studied.
    In <a href="https://pietercolpaert.be/research/2018/12/30/automating-reuse.html">last year’s blog post</a>, we outlined 3 focus topics: time series, text search and geospatial search and specific ideas on how to tackle them.
    The ideas we had on summarizing <em>time series</em> were too simplistic.
    There is no silver bullet when it comes to summarizing time series, although a novel technique called <a href="https://www.cs.ucr.edu/~eamonn/MatrixProfile.html">Matrix Profile</a> comes quite close.
    We are now studying that approach for compatibility with Linked Data and hope to publish this in 2020.
  </p>
  <p>
    For geospatial search, we are still in the process of developing different approaches.
    R-tree and tiling have been studied and described using hypermedia.
    In 2020 I hope we will be able to describe techniques like hexagonal tiling and geohashes too.
    There might be an interesting overlap with text search there, as something that is geospatially contained within another region will have an id that has the id of the larger area as its prefix.
    We abandoned the idea of hilbert indexes in hypermedia APIs however. They are an interesting idea for the back-end, but not for the hypermedia API itself.
  </p>
  <p>
    We are working on publishing the results of benchmarks we ran for time series, geospatial search and autocompletion services. Keep an eye on our publications!
  </p>
</div><div id="non-research">
  <h2>Goals in 2020</h2>
  <p>What would I love to look back on at the end of 2020? We are a team of computer scientists, so we should do two things well: write inspiring papers and deliver useful code.</p>
  <ul>
    <li><p>
        I want to get the <a href="https://github.com/pietercolpaert/TreeOntology">Tree Ontology</a> presented at international conference and discuss its current design with experts in the field.
        The current specification needs to be implemented in Comunica.
        Linked Connections and Routable Tiles need to be updated to become interoperable with the Tree Ontology in a 2.0 version.
    </p></li>
    <li><p>
        <a href="https://github.com/openplannerteam/planner.js">Planner.js</a> will be further developed as a client for route planning purposes.
        The planner will be extended with geospatial, time-based and full text search queries based on the Comunica implementation.
    </p></li>
    <li><p>
        I want to work on <em>developer enablement</em> for autocompletion services. Today this relies on centralized services where you send your entire query to. This as such as a privacy nightmare, and will always operate with a closed world assumption trying to fit all the world’s knowledge on one machine. I want to build a Comunica based tool that enables developers to work with existing open datasets, without having to set up a server, and do autocompletion on the client-side without loss of user-perceived performance.
    </p></li>
    <li><p>
        I will figure out how to integrate the Matrix Profile technique into a Web API specification for time series clients.
    </p></li>
    <li>
      <p>I want to dive deep into Read Write Data with SOLID (an ecosystem for personal data pods), implement a Mobility Profile into Planner.js, and figure out the parallels between SOLID shape descriptions and the Tree Ontology.
    </li>
    
  </ul>
  <p>
    Want to add your data project to our goals? Our growing team is <a href="mailto:pieter.colpaert@imec.be">open to your challenges</a>!
  </p>
</div>]]></content><author><name>Pieter Colpaert</name></author><category term="research" /><summary type="html"><![CDATA[Report on our team’s research in 2019 and what we will be doing in 2020.]]></summary></entry><entry><title type="html">Real-time election results as Open Data</title><link href="https://pietercolpaert.be/opinion/2019/05/27/elections.html" rel="alternate" type="text/html" title="Real-time election results as Open Data" /><published>2019-05-27T00:00:00+00:00</published><updated>2019-05-27T00:00:00+00:00</updated><id>https://pietercolpaert.be/opinion/2019/05/27/elections</id><content type="html" xml:base="https://pietercolpaert.be/opinion/2019/05/27/elections.html"><![CDATA[<blockquote>
  <p>Is there a way to know the geospatial boundaries of the cantons?</p>
  <cite>Uhmm… Not sure…</cite>
</blockquote>
<div class="teaser">
  <p>
    The data of the 2019 elections in Belgium, as the counts were being registered, were Open Data.
    Anyone was (and still is now) able to create their own view of the candidates per election, their preference votes, the parties and their votes...
  </p>
</div>
<div id="introduction">
  <p>
    Two weeks before the elections, we got confirmation we would receive the same data the media companies would receive about the election results.
    We had to think and act quickly. What can we set up in what amount of time?
  </p>
  <p>
    We decided to host a hackathon, or an electathon, how we decided to call it.
    Thanks to the people of <a href="https://www.becentral.org/">BeCentral</a>, where Open Knowledge Belgium has its office, we quickly could reserve a meeting room on Saturday before the elections.
    We set up a page at <a href="https://elections.openknowledge.be">elections.openknowledge.be</a> and we invited our network.
    About 10 people showed up, with one common goal: we will build something that, if the data would not have been open, you would never be able to have such an overview.
    Quickly, 3 main ideas came to exist:
    <ol>
      <li><strong>A one vote one dot map</strong>, showing a dot in a color per vote in a canton. We did not prioritize this idea however: it was not straightforward to get a map of the boundaries of all cantons, and the demo code we were working with only worked for static data. It would technically become a challenge to visualize this as more votes came in, although we will keep this idea in mind for the next election. (idea by <a href="https://twitter.com/LeenkeDeDonder">Leenke</a>)</li>
      <li><strong>A deeplink to the personal achievements of one candidate</strong>. The page would be the ultimate vanity metric for politicians.</li>
      <li><strong>A 3D print of a vase that grows bigger when new results come in. The vase gets skewed in the direction of the winning party.</strong> We would then try to sell the vase for €30.000, which is the amount the media companies have to pay for the data if the open data was not there.</li>
    </ol>
  </p>
  <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Group picture! <a href="https://twitter.com/hashtag/electathon?src=hash&amp;ref_src=twsrc%5Etfw">#electathon</a><br>Go team citizens! <a href="https://twitter.com/hashtag/Election2019?src=hash&amp;ref_src=twsrc%5Etfw">#Election2019</a> <a href="https://t.co/umaMvgR1YT">pic.twitter.com/umaMvgR1YT</a></p>&mdash; Xavier Damman (@xdamman) <a href="https://twitter.com/xdamman/status/1132409566908559360?ref_src=twsrc%5Etfw">May 25, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
  <p>
    We decided to focus on the second idea and create an overview of the votes per candidate.
    <a href="https://twitter.com/jbelien">Jonathan</a> created a <a href="https://api.elections.openknowledge.be/">best-effort API based on these files</a>.
    The API is publicly available without API keys, and has Cross Origin Resource Headers installed.
    We also made sure the responses are compressed and the right caching headers were in place.
    Everyone was now able to start coding upon this.
    As an example, I coded up a <a href="https://codepen.io/pietercolpaert/pen/yWjORy">quick codepen displaying the list of candidates per election</a>.
    This was quickly picked up on Twitter by <a href="https://twitter.com/tgoelff">Thib</a>, who created a nice overview of who you can vote for, and the seat distribution in real-time.
  </p>
  <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Should be working now, not yet with evolution and currently only with test data <a href="https://t.co/elWYVdmkqx">https://t.co/elWYVdmkqx</a> (the checkbox adds/removes &quot;?test&quot; from the API call) <a href="https://t.co/YtaIuf3vcz">pic.twitter.com/YtaIuf3vcz</a></p>&mdash; Thib (@tgoelff) <a href="https://twitter.com/tgoelff/status/1132568270589091840?ref_src=twsrc%5Etfw">May 26, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
  <p>
    <a href="https://twitter.com/MichielLeyman">Michiel</a> is Open Knowledge’s knows all can do everything digital handyman.
    He decided to implement the wireframes made by Leenke using React.
    The app was ready just in time for the elections, although we had plenty of more ideas to go into the page.
    See for example the personal result page of former minister of the Digital Agenda <a href="https://elections.openknowledge.be/2019/CK/Open%20Vld/de-croo-alexander-28432">Alexander De Croo</a>.
  </p>
  <blockquote class="twitter-tweet"><p lang="en" dir="ltr">Achievement unlocked: created a website with election data that’s more up to date than the media. Check out your politician’s preference votes as they become available in real-time: <a href="https://t.co/TukDTUvbdf">https://t.co/TukDTUvbdf</a> <a href="https://twitter.com/hashtag/electathon?src=hash&amp;ref_src=twsrc%5Etfw">#electathon</a> <a href="https://twitter.com/hashtag/opendata?src=hash&amp;ref_src=twsrc%5Etfw">#opendata</a> <a href="https://twitter.com/hashtag/kies19?src=hash&amp;ref_src=twsrc%5Etfw">#kies19</a> <a href="https://twitter.com/hashtag/vk19?src=hash&amp;ref_src=twsrc%5Etfw">#vk19</a> <a href="https://t.co/zlFQ50j710">pic.twitter.com/zlFQ50j710</a></p>&mdash; Pieter Colpaert (@pietercolpaert) <a href="https://twitter.com/pietercolpaert/status/1132668126473138178?ref_src=twsrc%5Etfw">May 26, 2019</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script>
  <p>
    With more than 8000 unique visitors on Sunday of this webpage, we feel extremely proud that we have been able to pull this off.
    Next elections we will go for more and better data, and more and better visualizations.
    Playing with the data has given us a great insight in how Belgian elections actually work.
    We had 6 elections with a total of 6927 candidates (I just checked with Michiel and he knew this number by heart by now). 
  </p>
  <p>
    For me as a Linked Open Data researcher, it became again painfully clear that we need better data management across silos in the government.
    We stumbled upon a dead end when trying to find geo boundaries for Belgian voting cantons: that should have been a basic dataset.
    We tried to find the logos of the parties on <a href="http://wikidata.org">Wikidata</a>.
    However, not all parties had a wikidata page, and if they had one, not always was their image up to date.
    Luckily you can edit wikidata yourself, but for this hackathon we quickly decided to make a file containing all logos and parties.
    The I files in the election streams should become base registries, always available and not only available for the elections.
    They should be available as Linked Open Data, so that all the identifiers for e.g., candidates, parties, lists, … could be shared among different datasets.
    The R files could equally as well use the same domain model as the UK elections with the Election Ontology available at <a href="https://ukparliament.github.io/ontologies/election/election-ontology.html">https://ukparliament.github.io/ontologies/election/election-ontology.html</a>.
    So many ideas to build a much more integrated data publishing strategy at the federal government.
    Let’s hope there will be a minister of the digital agenda soon: we have plenty of things to tell!
  </p>
</div>]]></content><author><name>Pieter Colpaert</name></author><category term="opinion" /><summary type="html"><![CDATA[Organizing and participating in the first electathon]]></summary></entry><entry><title type="html">The next steps for Open Data Portals? Data recipes!</title><link href="https://pietercolpaert.be/opinion/2019/03/21/data-portals-next-steps.html" rel="alternate" type="text/html" title="The next steps for Open Data Portals? Data recipes!" /><published>2019-03-21T00:00:00+00:00</published><updated>2019-03-21T00:00:00+00:00</updated><id>https://pietercolpaert.be/opinion/2019/03/21/data-portals-next-steps</id><content type="html" xml:base="https://pietercolpaert.be/opinion/2019/03/21/data-portals-next-steps.html"><![CDATA[<blockquote>
  <p>My dataset is on the data portal, why isn’t it added in every route planner now?</p>
  <cite>A city official.</cite>
</blockquote>
<div class="teaser">
  <p>
    We have been building Open Data portals and Open Data standards (see <a href="https://www.w3.org/TR/vocab-dcat/">DCAT</a>) for a while now.
    <!--While some just use their own website to create a list of datasets they publish,
    others go further with their own installation of the open-source CKAN (and alternatives exist), or by buying a proprietary product like OpenDataSoft or Socrata.
    The biggest benefit of Data Portal software is they automate the generation of machine-readable metadata.-->
    Yet, judging from the state of the art, still only humans get to understand what’s in an open data portal.
    We somehow need better metadata, in order for machines to make sense out of the big pile of data gathered on an Open Data portal.
    I believe the next challenges for Open Data portals are two-fold:
    (i) making sure industry players adopt “data recipes” (discovery algorithms) for finding datasets for a specific feature; and
    (ii) adding better metadata to existing datasets.
    I believe the latter can be achieved by innovating the user interfaces for adding metadata to your dataset.
  </p>
</div>
<div id="introduction">
  <figure>
    <iframe src="https://www.linkedin.com/embed/feed/update/urn:li:share:6514094019813924864" height="261" width="504" frameborder="0" allowfullscreen=""></iframe>
  </figure>
  <p>
    Stijn works for the city of Antwerp as a mobility specialist.
    The problem he experiences is a text-book Open Data challenge:
    <blockquote>
      <p>
        How do I get a dataset about a new local policy adopted in third party end-user interfaces?
      </p>
    </blockquote>
    It is not an act of philanthropy that leads him to publishing this data, his data must be reused in order for his city to function properly.
    It stresses the importance of having on the one hand intelligent bots that can integrate a dataset automatically,
    but on the other hand also making sure the metadata is of a high quality, to assist machines looking for data.
  </p>
</div>
<section>
  <h2 id="catalogs-to-recipes"><a name="catalogs-to-recipes" class="anchor" href="#catalogs-to-recipes"></a>From data catalogs to data recipes</h2>
  <p>
    Looking for a dataset is still a manual process.
    I had to personally ask Stijn whether there is already an opendataset about this, who knew where the dataset could be retrieved.
    It appeared to be in their geospatial dataset published (the metadata is so-far <a href="https://opendata.antwerpen.be/zoeken?query=Low%20emission">not yet integrated</a> on the main open data portal) at <a href="https://portaal-stadantwerpen.opendata.arcgis.com/search?q=Lez">portaal-stadantwerpen.opendata.arcgis.com</a>.
    This only left me to wonder: <strong>if I cannot find this dataset manually, how would a script from Google, TomTom or HERE  be able to discover this dataset</strong>?
  </p>
  <p>
    Instead of having human oriented full-text search forms in Open Data portals, we need to think about data recipes.
    These are flow-charts or algorithms that a robot can execute in order to automatically discover certain datasets.
    Such a recipe could look like this:
  </p>
  <ol>
    <li>Request <a href="https://opendata.antwerpen.be/">opendata.antwerpen.be</a> (probably in DCAT)</li>
    <li>Study all next possible steps from the open data portal. For example, these “flow chart” blocks could be offered:
      <ul>
        <li>zoom in on a specific geographic region,</li>
        <li>follow links to an overview of datasets that were added the latest,</li>
        <li>read about the latest local council decisions (this is the first digital source that may be a trigger a change in route planning advice),</li>
        <li>or follow links to datasets in certain themes such as “traffic rules”.</li>
      </ul>
    </li>
    <li>Follow the right links until a dataset of interest is found.</li>
    <li>In the case of the Low Emission Zone (LEZ), it is a boundary shape. In the future we should make sure a robot can detect that “if you are inside this shape, some extra rules apply”. This way, any next set of extra rules, not only the LEZ, will become adopted automatically.</li>
  </ol>
  <p>
    Every step in this recipe is close to how a human would discover a dataset, yet can be optimized for machines.
    This needs some new alignments with data reusers.
    On the one hand, companies such as TomTom, Google and HERE need to document what steps they take to understand data.
    And in some way, Google already does this with the <a href="https://search.google.com/structured-data/testing-tool">structured data testing tool</a> or <a href="https://storage.googleapis.com/pub-tools-public-publication-data/pdf/77547c8d2a7fba472e76c774028cf2b3c0afdb8a.pdf">in this paper by Natasha Noy, Matthew Burgess and Natasha Noy from Google AI</a> on creating a public dataset search engine.
    This way, when you want to publish a new dataset, you can try to make it work with how machines already interpret your data.
  </p>
  <p>
    On the other hand, you will have intended your data to be visited in certain ways.
    Document your building blocks that you expose on your website.
    At Informatie Vlaanderen, we put the first steps forward in this by creating a working group for <a href="https://github.com/Informatievlaanderen/generieke-hypermedia-api">a Generic Hypermedia API across Flanders</a>.
  </p>
</section>
<section>
  <h2 id="authoring"><a name="authoring" class="anchor" href="#authoring"></a>Authoring environments for metadata</h2>
  <p>
    Problematic today is the fact that an Open Data portal is supposed to be delivered one company only.
    A machine however does not care about back-end systems: links are followed seamlessly, regardless of what services are behind it.
    The main page of opendata.antwerpen.be could be generated by the Drupal system,
    the links to the geospatial search could link to an arcgis system,
    while other links could be given to CKAN instances, The DataTank, OpenDataSoft, an IoT Data Broker, and so forth.
    The important task for the people in charge of the Open Data portal in the city, is however to document the building blocks that are needed on every level, and expose these in machine readable hypermedia controls.
    It is up to the company to make sure these building blocks that can be used in a data recipe by a client, are fully functional.
  </p>
  <p>
    Yet, these building blocks today are invisible to the people who maintain the Open Data portal.
    How can we make these more visible?
    And can we come up with an authoring environment the automatically puts data currectly in this flowchart?
    Could this authoring environment for metadata also automatically suggest other building blocks to be added to your dataset?
    I think solving these questions will also automate a lot of steps for civil servants trying to publish data for maximum reuse.
    In each case, Open Data is still a domain that still needs to mature a lot.
  </p>

</section>]]></content><author><name>Pieter Colpaert</name></author><category term="opinion" /><summary type="html"><![CDATA[How do machines find specific datasets useful for their use cases?]]></summary></entry></feed>