Named graphs and RDF messages

I was often confused when I overheard logicians talk meticulously—like only logicians can—about the semantics of named graphs that were left open. Not being a logician myself, I often use named graphs in my projects and I didn’t realize I could possibly have been doing anything wrong. I’m pretty sure I’m not the only one: Are these logicians just being pedantic? Or should we pay close attention to what they’re saying, or even worse, should we abandon named graphs altogether? In this post, I give my pragmatic view on named graphs and why I believe they are here to stay. I hope my view is one—how wonderful that would be—logicians can agree with. However, I also criticize RDF1.1, not for having open semantics for named graphs, but for not having the concept of “RDF messages”.

After all these years since RDF1.1, named graphs are still very controversial today in a small circle of people. The reason is quite simple: the fourth element added to a triple, the named graph, was left semantically undefined. We have to understand that named graphs were already a well-discussed idea back in 2005 (Carroll et al., 2005), but the RDF1.1 recommendation introducing the fourth element was only finalized in 2014. Different technologies already had their own uses and implementations of named graphs, and the RDF1.1 group could not agree on a single specific meaning.

The RDF Working Group did not define a formal semantics for a multiple graph data model because none of the semantics presented before could obtained consensus. Choosing one or another of the propositions before would have gone against some deployed implementations. Therefore, the Working Group discussed the possibility to define several semantics, among which an implementation could choose, and provide the means to declare which semantics is adopted.

Src: RDF 1.1: On Semantics of RDF Datasets

Granted, a working group note did state that the precise interpretation could be provided explicitly in the metadata of a service you’re using. The only problem is, that I have not seen this being done 9 years later and I wouldn’t know about a vocabulary that would allow you to do so.

Pragmatically, I don’t believe this is as big of a problem as some would dare to state. When importing quads for another source however, you’ll need to know what you’re doing. In this post, I’ll try to gradually explain that either way, whether the source is using named graphs or not, you’ll have to interpret the quads. Only after a translation step, which we’ll call a “contextual assertion” process, you’re going to be able to import them in your store.

1
2
3
<https://pietercolpaert.be/> {
    <https://pietercolpaert.be/#me> a foaf:Person .
}

“I’m a person”—The intention of putting the statement in the context of a named graph can serve many different purposes. If you don’t know the system it is used in, you cannot automatically understand for what it is being used. I’m refuting this snippet later on.

We’ll first dive into 2 very different examples of how named graphs are used in practice today. We’ll then discuss the issues that arise from the open semantics, and I’ll give my advice on how to use named graphs in your next project. Finally, I’ll introduce the idea of RDF messages, which I believe is a necessary addition to RDF1.1.

An example: processing an RDF stream

1
2
3
4
5
<https://pietercolpaert.be/#2025-09-23>
    prov:generatedAtTime "2025-09-23T12:00:00Z"^^xsd:dateTime .
<https://pietercolpaert.be/#2025-09-23> {
    <https://pietercolpaert.be/#me> foaf:age 36 .
}

A typical example of an RDF Stream Processing message: a set of statements, those mentioned in this particular interaction in the named graph, are generated at this specific time.

There is an RDF Stream Processing (RSP) community that creates statements as in the example above: it uses the named graph to put certain triples in context. This way you can select the ones that are useful for your system. You should not consider these triples unless you are specifically interested in the statement made at that specific time. SPARQL can then support you to query for exactly those statements you are interested in cfr. the example below.

1
2
3
4
5
6
7
8
SELECT ?time ?age
WHERE {
  GRAPH ?g {
    <https://pietercolpaert.be/#me> foaf:age ?age .
  }
  ?g prov:generatedAtTime ?time .
}
ORDER BY DESC(?time) LIMIT 1

When querying the default graph, you will/should not be able to find my age. You will only be able to find my current age if you search for a specific graph, and filter on a graph with the last timestamp.

Another example: Uniprot’s partitioning

One of the biggest RDF datasets everyone should know about is Uniprot. UniProt is a knowledge graph providing context about protein sequences and is maintained by a consortium of international bioinformatics institutes. UniProt does not use named graphs for putting triples in context, but simply to organize them into partitions. I.e. all statements about diseases can be found in the named graph https://sparql.uniprot.org/diseases. The documentation of their SPARQL endpoint says there are 21 named graphs, although all named graphs are available in the default graph, so we should not worry about them.

Making all triples in all named graphs also queryable from the default graph is default behavior across popular triple stores such as GraphDB or RDF4J. Others, such as Apache Jena or the Comunica query engine, only query the statements explicitly that are explicitly in the defaul graph, or make it configurable. This is thus where you need to know what you’re doing. If we would want to load in the data from the RDF Stream Processing example in the default configurations of GraphDB or RDF4J, then our default graph would contain multiple contradicting statements. For example, it would contain that I’m both 35 years old and 36 years old, without the possibility anymore to check the context of the statement.

From the vendor perspective, I understand that they need a way to organize their work. In the case of Uniprot, although I didn’t check whether this is actually the case, I can imagine that when a new release of the database is done, that they drop a named graph entirely, and load the new dump for that part in place. The company RedPencil even has a system in which they rewrite SPARQL queries based on role-based access control rules. The latter however is not just for logical partitioning anymore: a named graph becomes available in the default graph when the access conditions are met.

From open semantics to functional use

Issue 1: Logical partitions vs. contextual assertions

Since RDF 1.1 left the semantics of named graphs open, practitioners have no choice but to adopt a functional perspective: how do we actually use them? The first issue is recognizing the different interpretations. Named graphs can be seen as contextual assertions (the triples hold in a certain context), as quotations (someone said these triples), or as logical partitions (all triples are true globally, just organized into buckets).

Pragmatically, I believe we only need one semantics: contextual assertions in which a named graph means “the statements inside are true in this context”. Quoting is a special case, where the context (“Pieter said so”) is more important than the payload. Partitioning is then yet another special case, where the system implicitly asserts a specific context, or maybe all contexts, as globally true.

Take the RSP example above: the fact that I’m 36 at a particular moment is only useful once the client has validated the context. Consumers decide which context matters, depending on their task. A time travel app might show my profile from 10 years ago, while another application may only want the latest context. SPARQL makes this straightforward: you can simply add conditions on ?g to for example only keep the most recent graph.

Quoting follows the same design pattern. The triples go in a named graph, but now the context metadata drives the query. For example, you may want to count how many things Pieter has been saying, without the need for asserting the statements in the named graphs.

Partitioning is yet another special case. Systems such as UniProt use named graphs for organizational purposes: all triples appear in the default graph, and for the consumer’s sake, the named graph are hardly relevant. In RedPencil system, access control rules determine which named graphs are exposed through the default graph. From the consumer’s perspective, this still looks like contextual assertions—just that the server has already decided which contexts to include.

Problems arise, not because of the open semantics, but because some systems only support partitioning: they automatically merge all named graphs into the default graph. In such stores, importing contextual data collapses into global assertions, making it impossible to recover the original contexts. For example, you’ll find that Pieter is both 35 and 36 years old. If your store is specific to one context (e.g., all things in this store are true at this moment), this is fine. But if you want to preserve contextual assertions, you either need an import-time processing step that flattens the data consistently, or move to a different store. Some RDF stores let you configure which graphs contribute to the default graph, which can already be a first step toward treating partitions as explicit contexts.

> My advice: for your next project, treat named graphs as contextual assertions by default, and apply their use as partitions when this is explicitly desired behaviour.

Issue 2: The named graph identifier

The second issue is one related to how you name your graphs, as the fact that are no strict rules can lead to confusion. The way I used the named graph in the first example in this post—using my website’s identifier as a named graph IRI—is problematic. Someone might use that identifier for something else, such as saying that my webpage contains 42 triples (it actually does at the time of writing this post). However, if in my context, I want it to say that the named graph on my system is a container for 1 triple, then there will be a semantic collision (I don’t think anyone coined this term in Linked Data yet—it adds the right amount of drama).

1
2
3
4
5
<https://pietercolpaert.be/> {
    <https://pietercolpaert.be/#me> foaf:age 36 .
}
<https://pietercolpaert.be/> void:triples 42 . # 💣
<https://pietercolpaert.be/> void:triples 1 .  # 💥

Using a named graph IRI that is also an information resource is a recipe for generating semantic collisions.

This simple example of using a named graph IRI that is also an information resource is only the tip of the iceberg. We want to be able to talk about our specific context, so it’s a good idea to use an IRI that you control. Not only because you want to avoid this kind of collisions, but also because the source may still change contents or change the definition of the context later on. From the moment a named graph travels across systems, the context changed: it has now become data that traveled from one system to another, so I believe it should get a new identifier. That identifier can however be used to point to the original context.

Renaming graphs when they travel across systems is however not default behaviour in RDF. At least, it is not for named graphs with an IRI. If you use blank nodes as graph names, and these blank nodes travel across systems, they will get a new identifier automatically.

Blank Node Graphs immediately also solve other problems, such as the one of accidentally—or intentionally—overwriting someone else’s named graph. Furthermore, you cannot dereference a blank node, nor name it from another document, so you’re certain that the quads in this interaction are complete, and you know that you have the full context about this graph as well. Certainly when working with contextual assertions, the name itself becomes less important anyway: you’re selecting a graph based on a description of the context.

1
2
3
4
5
_:b0 {
    <https://pietercolpaert.be/#me> foaf:age 36 .
}
_:b0 void:triples 1 ;
     prov:generatedAtTime "2025-09-29T16:34:00Z"^^xsd:dateTime .

Using a Blank Node Graph avoids confusion: a consumer must rename, and therefore nobody can accidentally add statements in or about this graph from another source.

What do you do with inferred knowledge then, one might wonder? Inferring from this data that my approximate birth year is 1989 will spawn a new context: a context that derived insights from this one.

> My advice: You must assume all triples from a certain external source, even when they are in the default graph, as done in a specific context of the source. This means that we always need to have an “assertion process” before loading data from somewhere. Some sources may explicitly have multiple contexts, for which they can use named graphs, and for which you can use the data about these named graphs in your assertion process (i.e. you may be only interested in statements made after a certain generation time). You should never blindly import named graph IRIs into your own system. As a data publisher, you can help consumers to automatically rename their graphs by using blank nodes as graph names.

Some stores however are not optimized for following my advice as they don’t index on graph name. This is certainly necessary when we want to be able to query across many small graphs, however. Either way, the idea remains: when using named graphs, you need to know what you’re doing.

Publishing RDF messages and asserting data from them

In our work at the Knowledge on Web-Scale team, we keep running into the same challenges: how to package and transmit RDF in a way that not only carries the data itself, but also its context—provenance, trust, credential boundaries, or member extraction hints. Whether we were dealing with verifiable credentials in wallets, with trust-preserving event streams in writable Linked Data nodes, or with extracting the set of quads from what we call the members of a Linked Data Event Stream.

Consider the example below: I want to indicate that there’s an event stream for public access with a message that contains context data, such as the fact that it is signed and generated at a specific time, as well as the statements containing my age. Using the context, a consumer can select those statements that are of interest to them, with the right trust level in which in this example the statement has been signed by an authority.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
<stream1> a ldes:EventStream;
  tree:member <message/2025-09-23/1> ;
  tree:view <> ;  
  acl:accessRights "public" .

<message/2025-09-23/1> a ex:Message ;
  ex:received "2025-09-25T16:30:00Z"^^xsd:dateTime ;
  ex:content _:b0 ;
  ex:signature _:b1 .
# The exact content used for that context in a blank node graph
_:b0 {
  <https://pietercolpaert.be/#me> foaf:age 36 .
}
# The signature: a named graph is needed
# otherwise we cannot keep the triples of the signature with the message
_:b1 {
  <Sig1> ex:signatureValue "MEUCIQDh..." ;
    ex:signedBy <https://pietercolpaert.be/#me> ;
    ex:signsGraph _:b0 ;
    ex:signatureAlgorithm "RS256" .
}

Blank Node Graphs are effective for associating a context with a set of triples. In this example however, we become forced to also use named graphs for something else: packaging triples together as part of one message. It works, but it’s clumsy. If only there were a better syntax-level tool to state the boundaries of an RDF message.

There’s a problem for our use case though: how do you select all the quads that are part of this event stream’s message? The quads are undeniably grouped together: adding them to this server was an atomic operation. In a streaming protocol like websockets, this would have been a message that was sent and received by a listener. In the binary RDF serialization Jelly we could have used their concept of frames. However, in the RDF1.1 specification, there’s nothing we can rely on.

Instead, we need to use a heuristic that will never be perfect. In a W3C Member Submission in 2005 predating named graphs, Concise Bounded Description (CBD) was coined as a unit of specific knowledge about a resource that could be interchanged between semantic web agents. I believe the author, Patrick Stickler, bumped into the same limitation as we’re bumping in today. The only difference is that we do have named graphs and we can solve our problem slightly differently.

1
2
3
4
5
6
7
8
9
10
11
12
## The Concise Bounded Description of entity 1 is
## the subject-based star pattern including the
## subject-based star patterns of the blank nodes it refers to
<entity1> a ex:Type;
  ex:label "an entity" ;
  ex:otherEntity <entity2> ;
  ex:otherEntity [
    ex:label "Yet another entity" .
  ] .
## Not part of the CBD of entity1
<entity2> a ex:Type ;
  ex:label "another entity" .

The Concise Bounded Description of a resource, as defined in 2005 before named graphs, includes the statements which does not include any explicit knowledge about any other resource which can be obtained separately from the same source.

The CBD of a resource s is defined as the set of triples consisting of all triples with subject s, plus recursively the CBDs of all blank nodes that are the object of a triple in the CBD. It was a pragmatic solution, and a heuristic at best. The fact that it was limited to subject-based star patterns put a limit on its usefulness.

Today, in Linked Data Event Streams, we actually use the CBD of the member IRI as the package for the triples in a message. We have however extended it to now include all triples that are in a blank node graph that is mentioned in the CBD. This way, we can include the signature triples in the message as well. While this works, maybe this should not have been necessary. The signature triples would have made perfect sense in the default graph otherwise.

RDF1.1 critique: serializations like TRiG should have the concept of “RDF Messages”

When using named graphs in a big file with quads, and you want to fetch data from one specific named graph, you’ll need to process the full file. RDF does not attach semantics to where in a file you mention a quad. For JSON-LD, there’s a working group note that specifies recommended ordering of triples in a document for streaming parsing. However, you still will need to wait for the full file to be parsed, to be certain that you got a full package of quads. This comes by design with a performance decrease.

The fix can be quite straightforward by adding a small feature in the existing syntaxes: a pragma for parsers to understand that an RDF message started, and when it ended again. It is for example already added in the recently proposed binary serializations Jelly RDF, in which it is called “frames”. You could compare this idea to the idea of streaming JSON documents such as Newline Delimited JSON, but then also for RDF-based serializations. Maybe this idea of RDF Messages could be something that gets standardized by the RDF Stream Processing Community Group. Something like newline delimited JSON-LD has been discussed in the SPARQL specification issue tracker.

SPARQL1.1 critique: you cannot construct data into multiple named graphs

The idea of contextual assertions means that we need a query language to select the data we’re interested in, and then construct data into our own context. For such context, I advise to use blank node graphs. SPARQL however today does not allow you to CONSTRUCT data into a named graph. This is an open issue in the SPARQL issue tracker: #31. This of course does not break our ideas, but it would be useful to attach a derived context to the data you’re constructing.

P.S.

Piotr Sowiński of Neverblink, who is the lead behind the Jelly RDF initiative, is going to give the keynote in the Linked Data Event Streams workshop at the SEMIC conference.

In TREE/LDES we first tried to work around open semantics of named graph by having a very complex member extraction algorithm. I regret having spent so much time trying to get it right, while in the end it’s just a heuristic. We even tried to position a concept called Shape Topologies, to try to reconstruct the RDF Message based on the SHACL shape of the member. It would have been much simpler if we had had the concept of RDF messages.

Having Jos De Roo—without whom my understanding about blank nodes and named graphs would never be where it is today—around in the office has some perks: you occasionally get a reminder of historic discussions during lunch. Pat Hayes would apparently call confusing use and mention the “mother of all bugs”. The single most fundamental mistake in Web and semantic web architecture is failing to distinguish between talking about the thing and talking about the name of the thing.

I’ve been told that I follow my own interpretation of the 90ies work on “formalizing context” by the logicians McCarthy and Buvač. We can write this context proposition more formally as follows: istrue(context,φ). The set of statements in φ is true in a context; you can thus assert φ when your application’s state asserts this context as true indeed, for example because the source is truthful: istrue(SourceContext,φ)∧trustedByClient(SourceContext) ⇒istrue(ClientTruth,φ).

It was Tobias Rebert who initially triggered me through a LinkedIn post to write this blog post: I was a bit frustrated that he would see labeling named graphs as something you’d do while drinking coffee, just like cleaning your home or My Documents folder. I couldn’t just comment among one of the many that started commenting, because I needed more nuance than that. Tobias, if you’d read this, do let me know whether you agree with me!

EDIT 2025-10-01: I was asked for what my view was on RDF1.2 then, and whether the problem discussed here wouldn’t be solved then. I argue that there is no problem where people see one (the open semantics). Where I do see a small problem (streaming RDF messages) RDF1.2 doesn’t provide a solution and it’s too late to propose this to the group at this stage. The revived RSP group might be a better place to propose this, as it’s also a typical problem in streaming scenarios. Triple Terms in RDF1.2 are a solution for something different yet complementary, which will help datasets like Wikidata come up with a more standardized solution to the property-based triple annotation system they have today: not for contextual assertions, but for tracking the provenance of a resulting asserted statement. It’s very complementary to the idea though.