Four types of specification artefacts

Interoperability isn’t about creating the one standard to rule them all. It’s about creating small reusable pieces of the puzzle. In the world of Linked Data we go in the right direction, but don’t go far enough, which still today leads to poor adoption despite containing the right ideas. I argue we need four types of reusable artefacts in specification: vocabularies, application profiles, interaction patterns, and implementation guides. Together, they form a toolbox for connecting systems across domains: implementation guides provide the much-needed answers developers need to build their systems, while interaction patterns provide composable functionality on top of the application profiles. Allow me to elaborate…

It’s not a secret that I’m a big fan of the approach Linked Data is taking when it comes to interoperability. The technology is based—as the name implies—around the idea of being able to link to each others’ concepts using global web identifiers, or simply web addresses (IRIs). For the domain models, this already means a split in types of specifications into vocabularies and application profiles. Application profiles are well-defined “schemas” that tell data providers what the shape is their data needs to adhere to. Yet, it’s the vocabularies that make the terms used in these profiles reusable in others, as various schemas can reference the same concepts, and those concepts in turn can be interlinked again in case a project would like to align in a later phase.

It’s this kind of separation in types of specifications or artefacts of a consensus that made me wonder whether other types of specs should exist. Outside of the Linked Data world I often see specifications that try to cover everything by defining APIs and schemas in one go. This way, they skip the reuse of existing terms and patterns. However, Linked Data specifications often don’t cover enough. For example, if you want to build a metadata catalog, there is a W3C standard for that: the DCAT vocabulary. The European application profile—SEMIC’s DCAT-AP—then defines what EU organizations must support in order to work within aggregators. However, when you now would like to do something with a data portal, all options are left open. E.g., how do you take a copy of the data portal and stay in sync with it? What is the procedure to add a new dataset in the portal? There is no answer to be found in neither the DCAT vocabury or the European application profile.

I was part of a SEMIC pilot on the question “how do I take a copy of a DCAT-compliant data catalog and stay in sync with it afterwards”. We had been working on a specification for an interaction pattern between clients and servers for replication and synchronization and thought we could apply it exactly to this domain. However, it remains quite abstract how to build this specifically for data catalogs. For that purpose, we’ve built an implementation guide called DCAT-AP Feeds in collaboration with DIGG, the digitization agency in Sweden. The implementation guide reuses the application profiles for DCAT-AP, as well as the Linked Data Event Streams interaction patterns for staying synchronized with event streams. While it did not add a lot of normative text, it brings all of this together in a specification that is ready to be implemented for aggregating data catalogs across Europe.

How we approached DCAT-AP Feeds is also how I see this happening in other specifications. When these interaction patterns become composable into implementation guides, that’s when we’re going to see interesting reuse happen. Let’s go through the 4 types of specifications, and I’ll show what I think are good examples of this composability.

1. Vocabularies – agreeing on words

Vocabularies are simply lists of web addresses with a certain meaning one can use. Their meaning is explained in full text. The DCAT vocabulary for example contains definitions for terms like Dataset, Catalog or DataService. You can decide whether you agree with the definition and reuse the term, or you can decide to be more specific, and instead create your own vocabulary. Vocabularies can contain classes and properties for a domain model you may want to instantiate, but equally as well it can contain code lists or a taxonomy. In more advanced projects, also complex relations between terms can be described in something we would start calling an ontology.

It’s hardly possible to “comply” to a vocabulary, while this is of course something you would expect from a standard. You can reuse terms—or link up to terms—and this way make sure semantic interoperability problems will take less effort to solve. However, it is clear that this is only a first step towards solving interoperability. We will need more tools than just vocabularies…

Technologies

RDFS (RDF Schema—although I wouldn’t call this a schema anymore today) to describe classes and properties,
SKOS (The Simple Knowledge Organization System) to describe code lists and taxonomies,
OWL (The Web Ontology Language) to describe more complex relations between terms.

As these are quite established RDF technologies, Large Language Models are particularly good at helping you to create those. Check out this example using Mistral.ai (prompt: “I’m creating an RDF vocabulary for father, mother, person, kid and the relations between them. Generate a turtle code example using RDFS.”).

Example: governmental vocabulary initiatives in Europe

In Europe, it is common to maintain vocabularies at different levels: regional, national, and European. Each of these initiatives publishes terms that can be reused across projects, lowering the cost of integration. At the European level, SEMIC maintains the so-called Core Vocabularies such as the Core Person Vocabulary and the Core Location Vocabulary. These are used in many application profiles, ensuring that a “person” or “address” is described in the same way across member states.

In Flanders, we have the Open Standards for Linking Organizations (OSLO) initiative. OSLO publishes RDF vocabularies for concepts such as addresses, mobility, culture and many others, which are then applied in local data exchanges and reused by cities and regions. The OSLO-initiative is not in contradication with the European vocabularies: it reuses IRIs where there’s a perfect match, and it links up to broader terms where relevant. Other countries have similar efforts, such as Finland that maintains its own reusable vocabularies at finto.fi. I believe every country should have an entrypoint to their vocabularies like this.

Example: ActivityStreams

ActivityStreams is a vocabulary originally developed at the W3C to describe social web activities. It defines concepts such as “Person”, “Note”, “Like”, and “Follow”; but also more generic concepts such as a “Create”, “Update” and “Delete”. This vocabulary is reused in the ActivityPub protocol, which powers federated social networks like Mastodon. Thanks to the vocabulary, a “Like” expressed in one system can be understood in another, even if the systems themselves were not built together. In the DCAT-AP Feeds implementation guide, we also re-use the semantics of a “Create”, “Update” and “Delete” from this vocabulary. It’s a great illustration of how vocabularies allow reuse across completely different applications.

2. Application profiles – agreeing on shapes

We’ve established that a vocabulary alone is not enough. You also need to specify what shape of data an application actually expects. While RDF vocabularies have been around since the late nineties already, application profiles are a more recent development. They gained traction when it became clear that applications also need to agree on which terms are required, which are optional, and how they fit together.

Profiles make these expectations explicit so that producers and consumers know exactly what to exchange, and this can be validated. It is thus possible to comply to an application profile, yet this is still to be taken with a pinch of salt: what group of statements you decide to validate at which phase during a process is also important, yet this is not typically part of an application profile. For that, we will have to refer to the next section.

Technologies

SHACL (the Shapes Constraints Language) describes shapes for RDF graphs and is widely used in European application profiles.

ShEx is an alternative shape language with a more compact syntax. I haven’t personally used it, but have encountered it in health care use cases.

Both technologies are being discussed in the free online book “Validating RDF”.

Example: governmental initiatives

At the European level, SEMIC and the Publications Office of the EU maintain the registry of official application profiles such as DCAT-AP for EU data portals, or CPSV-AP for describing public services. These profiles shape how vocabularies are applied for specific use cases. In contrast to the DCAT vocabulary, you can validate the DCAT-AP application profile, and thus comply to the specification. The European Commission makes a validator available for the current and previous versions of DCAT-AP. If your data input does not validate there, it is not going to make it on data.europa.eu.

In Flanders, the vocabularies initiative also comes with application profiles, and SHACL artefacts, that guide local governments in publishing interoperable datasets. Other countries host national schema catalogs. In France, schema.data.gouv.fr serves as a central registry for public-sector data schemas, allowing producers to discover, document, and align their data models to commonly used formats. Similarly, Italy and other EU member states have their own initiatives to document and publish application-specific schemas, supporting interoperability at the national level. Also non-Linked Data schemas can be found there that structure CSV-files or validate JSON structures with JSON Schemas. While these initiatives have their merit, I believe we should be unambiguous about whether certain terms follow the same semantics or not. The decoupling of application profiles and vocabularies is an important one.

Example: the dataspace protocol

In the world of dataspaces, the Dataspace Protocol relies on JSON-LD combined with JSON Schema validation to ensure that contracts and dataset descriptions can be interpreted consistently across participants. This is a very pragmatic approach to bridge the gap between interoperability within an app ecosystem and cross-app interoperability. Instead of SHACL or ShEx, a serialization specific validation method is used. The drawback is that the protocol is now unnecessarily coupled to JSON, and that it also standardizes the structure of the JSON document instead of the shape of the graph. The benefit is that this looks very familiar to developers who have worked on client-server implementations in the past. All in all, these are just implementation choices that reach a similar goal.

Example: The European Union Agency for Railways (ERA)

In the railway sector, ERA has published SHACL shapes that specify how railway entities such as stations and tracks must be described. National railway companies may have richer internal models, but when they want to interoperate at the European level, these shapes define the minimum requirements. This way, a European Railway Infrastructure (RINF) Knowledge Graph can be created, which today is a fundamental tool for the railway sector to understand what vehicles will be compatible with a certain railway route across the borders of EU member states.

3. Interaction patterns – agreeing on flows

Interoperability is not just about validating data or reusing vocabulary terms, but also about how systems or agents interact with each other. These interactions can be low-level, such as how to exchange messages over HTTP, or high-level, such as describing organizational procedures.

A story without technology I like to tell is how you change your first name. You cannot take your identity card, scratch the name and replace it with another using a permanent marker. Instead, you need to follow an official procedure, usually defined by your municipality. For example, here is the information page from Brussels documenting the steps you need to take. Once you’ve completed the process, you are issued a new identity card, and other systems—such as the population register—are updated as well.

Not the right way to change your name, yet this is often how we implement our web services today.

On the Web we often forget this procedural layer. We simply overwrite data, re-upload a dump, or push an updated knowledge graph, without any guarantee that the right process was followed. By making interaction patterns explicit, we can attach trust and compliance levels: when a system proves that a certain procedure was followed, consumers can rely on it.

Interaction patterns are reusable flows, comparable to state machine or flowcharts, to achieve a certain goal. They define not only the messages exchanged, but also the order in which they happen and the conditions under which they succeed. In order to do so, they can reuse terms from vocabularies and can define the shape of the data they expect in application profiles.

We can already see them in action across domains, from liking someone’s post on social media, to data synchronization, to contract negotiation in dataspaces. Such patterns should be composable: I might want to synchronize a dataset, but also tell someone I “liked” the dataset it after I negotiated access to the dataset.

Technologies

Developer documentation explaining in a higher level fashion how the process works using state machines, flowcharts or sequence diagrams, or lower-level documentation such as HTTP protocol bindings,
Hypermedia controls to describe the next possible steps in an interaction,
Rule languages, such as Notation3 or datalog, to formalize and automate such state transitions,
Procedure extensions to CPSV-AP or other workflow notations for higher-level organizational processes.

While the previous chapters already have established tooling within and outside of the linked data domain, convergence on technologies to achieve interaction patterns is still on-going. The majority of specs that I find good examples of interaction patterns today are written as developer documentation. Developer (or LLM?) documentation means that still those patterns will be hard-coded. It would be nice if we could build an abstraction for those patterns, so that engines can automatically understand the interaction pattern and do not need to have their code adapted (this is what I called ambition level 3: cross-engine interoperability in my previous post).

There is however no consensus yet on this abstraction layer, and I also doubt whether there will ever be one definitive one. The idea of hypermedia controls in APIs have not really reached the adoption one would have imagined, as they were positioned as part of one of the constraints in the REST architectural style by Fielding in the early 2000s. However, I still believe this is the way to go: when fetching a page, you should also get the descriptions—so-called hypermedia controls—of where you can go from here. Various Linked Data initiatives adopted this idea. E.g., the Linked Data Platform (LDP) is a vocabulary, application profile and set of interaction patterns with HTTP protocol bindings for read-write Linked Data information resources. When you implement these interaction patterns, a client will be able to understand how to read the contents of elements in a possibly paginated container, and how to change their representations. LDP is then again adopted by the Solid project for building personal data vaults, that takes a subset of the interaction patterns within LDP, and extends it with access control and user profiles (WebID). Other specifications like Hydra, TREE, Web of Things, or ActivityStreams collections also adopted hypermedia at the heart of their interaction patterns.

CPSV-AP is an application profile to describe public services in Europe. It would be nice if CPSV-AP would be extended to also contain a description of the procedures that are otherwise just described in full text (cfr. the information age to change your first name). This was an experiment already back in 2021 in Flanders with OSLO-steps.

Example: ActivityPub

In ActivityPub, the protocol behind Mastodon, interaction patterns are at the core. When you “like” a post, there’s a defined flow: your server creates a “Like” activity, delivers it to the author’s server, and that server then updates its counters. The same goes for following someone, posting, or resharing content. These flows are reusable: any implementation that supports the ActivityPub protocol understands what a “Like” or “Follow” means, even if it was initially published in entirely different communities.

Example: synchronization with LDES

Another example is Linked Data Event Streams (LDES). Here, the interaction pattern defines how clients can replicate a dataset and stay in sync with updates over time. Whether the source is a cultural heritage collection, traffic sensor data, or a national data portal, the replication flow remains the same: fetch the most recent view, then follow links to receive incremental updates. Only the vocabulary and application profile differ, which makes the replication pattern composable and reusable across domains.

LDES itself is also a good example of this separation of specs. It consists of a vocabulary with application profiles for validating the pages, as well as the interaction patterns. The spec itself is written from a consumer-perspective for that reason. The application profile of LDES itself also reuses the TREE hypermedia vocabulary, making sure to reuse semantics where it makes sense.

Example: contract negotiation in the Dataspace Protocol

In dataspaces, data exchange usually requires a contract that specifies terms of use. The Dataspace Protocol therefore defines an interaction pattern for contract negotiation. It specifies the sequence of messages (offer, counter-offer, agreement) as well as their bindings to HTTP. Participants in a dataspace can thus automate negotiations while still reusing existing vocabularies such as DCAT for dataset descriptions or ODRL for usage control policies. This is a prime example of combining vocabularies, application profiles, and interaction patterns into a coherent whole.

Example: evaluating ODRL policies with N3 rules (FORCE)

ODRL gives us a shared language for usage control, but its evaluation semantics are still underspecified—different engines can interpret the same policy differently. The Framework for ODRL Rule Compliance through Evaluation (FORCE) tackles this by defining a repeatable interaction pattern for policy evaluation and by shipping a tested evaluator plus a common report model. Instead of having to hard-code the rules for evaluating such ODRL policies, they are described in Notation 3 (N3). An engine runs those N3 rules to decide which permissions/obligations/prohibitions are active and returns a machine- and human-readable compliance report.

4. Implementation guides – agreeing on practice

Even with vocabularies, profiles, and interaction patterns, developers still need clear instructions to follow when implementing a specific use case. That’s where implementation guides come in: they combine all artefacts into an end-to-end recipe, lowering the entry barrier for developers.

For example, in the SEMIC pilots on LDES, the implementation guides walk implementers through how to publish a dataset as a stream. Instead of just defining vocabularies and patterns in the abstract, the guide gives concrete examples, step-by-step instructions, and reference implementations. This made it possible for multiple domains to reuse the same replication pattern with only small adjustments.

Another strong example are the Once-Only Technical System (OOTS) specifications. They provide the API descriptions for governmental procedures documented in the Single Digital Gateway Regulation (SDGR), such as how a citizen can change their address across borders. The guides describe the flow end-to-end: which vocabularies to use (e.g. Core Person, Core Location), which application profiles to validate (e.g. CPSV-AP), and which interaction patterns to follow (e.g. verifiable credentials). They could go further in this vision, but already show the power of an implementation guide as a binding document between law, policy, and technology.

Implementation guides complete the picture: they are the glue that ensures vocabularies, application profiles, and interaction patterns move from paper into running code. Without them, interoperability risks staying theoretical. With them, it becomes practice.

P.S.

Seeing interoperability through these four artefacts helps avoid both extremes: the chaos of everyone doing their own thing, and the rigidity of forcing one grand standard. Instead, we can identify what already exists at each level, reuse it, and only invent what’s missing. This perspective also makes it easier to carry lessons across domains: a museum and a mobility operator may not share vocabularies, but they can certainly reuse the same interaction patterns or learn from each other’s implementation guides.

With this vision, I don’t believe we should build software that is domain-specific anymore. No domain is so unique that it requires domain specific data pipelines. There will always be opportunities to maximize the reuse of interaction patterns. If a suitable one does not yet exist, we can define it in such a way that others can reuse it too. That is how we move from isolated solutions to an ecosystem of reusable building blocks.