Arthur Vercruysse, Sitt Min Oo, Pieter Colpaert: "Describing a network of live datasets with the SDS vocabulary", Proceedings of the 8th Workshop on Managing the Evolution and Preservation of the Data Web (MEPDaW 2022) co-located with the 21st International Semantic Web Conference (ISWC 2022) (2022).
Biblio entry: 8771000.
Abstract
Data publishers can provide multiple interfaces per dataset. Each interface has its own merits and drawbacks, SPARQL endpoints are expensive to host and clients find it difficult to work with static data dumps. Furthermore, query agents can only select the most fitting interface and dataset if provenance information is provided. In this paper, we introduce the Smart Data Specification for Semantically Describing Streams (SDS) to annotate dataset interfaces with provenance information, describing the consumed stream and the applied transformations on that stream. We focus on Linked Data Event Streams that can publish the same dataset with different fragmentations and demonstrate a pipeline that transforms a LDES and publishes the data with a different fragmentation as described in the accompanying provenance information. The SDS vocabulary is built upon the DCAT-AP, LDES and P-Plan
vocabularies. In future work, we will create a source selection strategy for federated query processors that take into account this provenance information when selecting a dataset and interface to query the dataset.