While searching for papers I found that in most of them the next conclusion is made:
Here is how reification is done in RDF.
- This form is too verbose to write.
- Lets invent our own serialization format!
Which is not interesting from implementation perspective. But after reading that part of RDF spec I realized that we cannot state RDF compatibility yet. One of the examples why:
ex:item10245 ex:weight "2.4"^^xsd:decimal .
# should be interpreted the same same way as this:
ex:triple12345 rdf:type rdf:Statement .
ex:triple12345 rdf:subject ex:item10245 .
ex:triple12345 rdf:predicate ex:weight .
ex:triple12345 rdf:object "2.4"^^xsd:decimal .
This also means that queries with there abstract predicates should also work:
// Returns all quads
Further, there is a
<rdf:value> predicate that works the same way with node->value relation.
Thus, given a simple quad:
<alice> <follows> <bob> .
we should build this kind of data structure internally (type predicates omitted):
_:n1 <rdf:value> <alice> .
_:n2 <rdf:value> <follows> .
_:n3 <rdf:value> <bob> .
_:q1 <rdf:subject> _:n1 .
_:q1 <rdf:predicate> _:n2 .
_:q1 <rdf:object> _:n3 .
Given all the above, I want to discuss few design decisions:
- Nodes and quads should have a separate unique ID that could be used to attach additional metadata, not related to actual values.
- HasA and LinksTo iterators are in fact a subset of some Traverse iterator with
rdf:subject/… as Via parameter in forward or reverse direction.
- A new ValueOf iterator can be introduced to replace NameOf method on QuadStore. The same stands for Quad method.
IDs for nodes and quads
We already have a sort of IDs for these internally - the hash. Lets assume for now that we don’t want to introduce a unique intermediate blank node, as in RDF. So, what type the hash should have? I think the same blank node concept is a good fit.
Thus, the first proposed change is to make a
quad.BNode an interface and replace
graph.Value with it. This allows to use these intermediate values (
graph.Value) in queries without the need to call NameOf or Quad method on QuadStore. They become a well-defined first-class objects without loosing the flexibility they were made for.
Most implementations are using hashes as IDs, thus they may be represented as bnodes like
_:n-900150983cd24fb0d6963f7d28e17f72 or with
q- prefix for quads.
Now, the reasons why we need an intermediate nodes for values and quads, in my opinion:
- They can be used them to attach metadata. This is a problem - we cannot add metadata without affecting value/quad hash right now.
- Values and quads can be updated. Tx with delete-insert operations are nice, but they cannot replace values or fix typos efficiently.
- One True Graph. Any relation, builtin or not, works the same way as any other. At least from the user’s perspective. SPOL indexes are potentially the same as indexes on any other predicate.
- SameAs might be easier with this, because we can attach few values to one node. Not sure if it’s a good idea or not.
There is a lot of things to discuss here, so I’ll cut it short to hear other thoughts on this problem.
LinksTo iterator was used to traverse from nodes to links via a certain direction. Virtually, it can be replaced with some Traverse iterator which follows in reverse via
<rdf:subject> link, for example. The same is true for HasA iterator, but in forward direction. This will not affect code too much right now, but at least path lib should have to know that
Out("<rdf:subject>") should be translated into
Right now there is no way for optimizer to know if user wants to just enumerate nodes (like most iterators do), or if he wants to get a value via NameOf later. This is a possible optimization - introduce an iterator which can convert values from
quad.Value, allowing to hide the details how it was retrieved. PG might use inline JOINs in this case, other backends might want to batch NameOf (materialize a page from sub-iterator and resolve multiple names at once). Also, path lib needs to know that
Out("<rdf:value>") should be translated into
ValueOf(it). Same for bnode->quad conversion.