Ruben Dedecker, Julián Rojas Meléndez, and Pieter Colpaert: "The Vocabulary Hub as a Catalog for Semantic Artifacts for Discovery and Alignment of Datasets", Extended Semantic Web Conference (ESWC 2026): The Fourth International Workshop on Semantics in Dataspaces (2026).

Abstract

The EU Common Data Spaces initiative aims to enable secure, sovereign and interoperable data sharing across organizational and national boundaries. However, the high heterogeneity of underlying data models and formats, prevents semantic interoperability from being realized. Publishers can address this challenge by exposing their internal knowledge by adopting continuous publishing models that reduce operational overhead for both publishers and consumers. Yet for data consumers, costly alignments still remain a necessity when the semantics of published datasets differ from their expected internal data models and schemas. Data spaces require mechanisms to define, discover, and govern such alignments throughout their entire lifecycle, enabling eventual interoperability. In this paper, we show that considering additional semantic artifacts as part of the vocabulary hub, namely dataset profiles defining structural and semantic constraints, and profile alignments (e.g., in the form of SPARQL construct queries), could provide consumers with a semantic entry point for dataset discovery and integration. We focus on the interaction patterns afforded by the additions of these semantic artifacts and provide a demonstrator implementation of a user interface that integrates this functionality. We validate our approach through a use case from the DeployEMDS project, focused the automatic discovery and alignment of traffic measurements. The extended vocabulary hub enables clients to discover datasets based on profile characteristics such as shapes, ontologies, and publishing data models, while also identifying available alignment pathways toward target consumer data models. It shows how the technical barriers for creating and relying on semantic alignments are lowered, enabling the consumption of data using the desired vocabularies and schemas. Future work will focus on integrating this component with existing data space connector implementations to further automate semantic interoperability by enabling semantic and profile-based content negotiation for data exchanges.