Near-term project: RDF Toolkit for Cayley


#1

So I’ve been thinking about the discussions we had over the last month in threads:

RDF Pipeline:

Cayley public instance:

I know @carnalim has to put the larger RDF Pipeline project on hold for the time being, and @cayley.z, I’m not sure if you’ve worked at all on a front-end?

I’m thinking…why don’t we think of a small set of functionality we’re definitely going to need to create and modify RDF graphs in Cayley and build a library of utility functions that we can use to support all of these more ambitious projects. These utility functions would black-box common operations that need to be performed when manipulating and querying an RDF graph. The library might have functions like:

Schema Manipulation
3. CreateOrReplaceClass - Take in a class specification and create all the associated quads.
4. CreateOrReplacePredicate - Take in predicate specification and create all associated quads
5. CreateOrReplaceProperty - Add a property to a class and create necessary quads

Data Manipulation

  1. CreateEntity - Creates instance of a class already defined in the schema
  2. SetEntityProperty - sets the property value of a specific subject
  3. DestroyEntity - Takes an IRI as an argument and destroys all quads in the graph containing that entity
  4. GetRelatedObjects - Takes a subject IRI and predicate and returns set of object IRIs
  5. GetRelatedSubjects - Takes an object IRI and predicate and returns set of subject IRIs
  6. GetSubjectJSON - Takes a subject and returns a JSON object with all predicate-object relationships for that subject. Note: need to build safeguard against unbounded objects caused by recursive relatinships. Maybe take a parameter for max depth of tree structure returned.
  7. Get Object JSON - Similar to #6 but works in the reverse. Returns JSON object containing the list of subjects in quads where this entity is the object.
    8…

Questions for you all:

  1. Does something like this exist already for Cayley?
  2. Do you think this is valuable?

I’d be willing to take the lead on the development of this library provided the veterans here can provide input on implementation questions and putting a box around the library feature set. Let me know what you all think.


#2

@robertmeta and @dennwc, would like to have your input when you have a minute. TL;DR: do you see value in creating a new go library with methods that simplify the manipulation of RDF graphs in Cayley. I would like to take a whack at coding this but want to make sure I wouldn’t be reinventing the wheel.


#3

Also @barakmich if you have a minute.


#4

Absolutely, ORM is a step in that direction. But sort of a EasyCayley would be excellent. We actually have an issue somewhere for something similar, but not in a place I can look it up at the moment.


#5

Ok, awesome. To clarify - is there already work in progress on an ORM for cayley? Searched the repo and didn’t see any references to it.


#6

We merged a simple ORM implementation recently, you might find it useful. It already can read/write Go structs and automatically converts them to quads and back.


#7

Awesome, I see that now in the schema package. Thanks.

I’ll get caught up to speed on what all it does. Especially if it does a lot of the stuff we need, I might put together a basics guide on how to use it. Correct me if I’m wrong but I don’t see any docs out there for it currently.


#8

No docs yet, except of docs for package functions and only one example (hello_schema).


#9

@dennwc, I’ve been studying the schema (ORM) package, and I’m trying to reconcile the goals of this existing code with what I hope to accomplish with this helper library for easy graph manipulation. I’d like to leverage as much as your code as possible, but first I need a clear understanding of where the responsibilities of your code ends and the responsibility of the new functions begin.

A couple basic questions to start when you have time.

  1. I’d like to clearly delineate the concept of “properties” (where the object is a literal) from “relations” (where the object is an IRI). Does this delineation make sense with the model your package implements?
  2. One key way a “relation” would differ from a “property” is that a relation can have cardinality either One-to-One or One-to-Many between the subject class and the object class(s). One job of this graph manipulation library would be to enforce this cardinality as well as the classes that can be objects of the relation. Would any of your existing code support this enforcement of constraints, and if so, how would you achieve this? I know you specified in the docs that “This package is not a full schema library. It will not save or force any RDF schema constrains”, but it does seem like some of the functionality you wrote may be applicable here.

Thanks,
Jeff


#10

@tamethecomplex The idea behind this library was to make it both easier and faster to access structured subgraphs. Since we know the access pattern, we can optimize it inside the library. I expect to use the same functionality in GraphQL, thus we will have one common path to retrieve structured data and any code that has a similar access pattern will benefit from optimizations we make.

Regarding your questions:

  1. Right now it make no distinction between properties and relations the way you define them. The reason is because user can optionally pass a function to convert values retrieved from database to values that he wants to see in result, so technically IRI might become plain string in user’s code. But, If we need to enforce the constraint of some relation/property to have a certain value type, then it should become a responsibility of this library (since it may utilize type indexes, if we have them in QS).
  2. I can’t see the difference between “relation” and “property” here, since properties might also have One-to-One and One-to-Many cardinality. Am I missing something? Right now, library enforces only a kind of “not-null” constraint on properties/relations, the similar way as our Save/SaveOptional works: it will ignore whole object if constraint states that it should have at least one relation of defined type. But, there is a code path that checks how many results for a relation we got and how many of them we want to use in user’s result set. Thus, that code path can optionally enforce cardinality as well.

To summarize, this is how I see a higher-level library might work:

  1. Define a model of well-known types from RDFS/OWL, and use these models to store user-defined schema and constraints in graph using low-level ORM.
  2. Provide an API to create/update/delete these classes and properties in a user-friendly manner, using helpers from ORM. For now there is no way to conditionally update a part of object, but we will implement it if needed.
  3. Allow to load/store objects with defined classes and properties and build constraints for ORM to enforce on these objects. Probably some of these constraints will be outside of scope of low-level library, but anything that affects loading (one/many values, existence of certain relations) and types of values should be builtin inside low-level library. This way we will have a chance to optimize them later depending on what specific QS can do for us.

#11

My motivation for this distinction comes from some wireframing I’ve been doing for a schema editing app. To a user I think there would be a logical distinction between pointing to a literal (which by definition is an endpoint in the graph) and an IRI (which may continue onto other vertices in the graph). That said, I can see that maybe this distinction belongs at the application level rather than the library level.

When you say “this library”, do you mean the high level library or the lower-level ORM? Also maybe I should know this, but what does “QS” mean?

I had been thinking that “properties” pointing to literals would always be one to one. If the object needed to have multiple values, it could be an array literal. But maybe this wouldn’t always make sense and one would want to express the same property in multiple statements rather than a single statement pointing to an array?

That sounds great. Which functions would I use for this?

I would like to help develop this high-level library. I will come up with a minimalist candidate design (function signatures and descriptions) for review and post back to this thread.

Question…what do you think makes sense to name this higher level library? Since the ORM / low level library is in the “schema” package, would the higher level functions be in the schema package as well, or in their own separate package?


#12

I meant low-level library, sorry.

QuadStore

Thought about this earlier, but still have doubt if it should behave this way or not.

I should expose them first :slight_smile:

It may live in the same package for now. And if we want to separate them, I’ll need some other name for low-level package.


#13

I’ve been thinking about this more, especially coming up with a candidate minimalist set of rdf / rdfs /owl terms we need in order to build the high level schema library.

One term I have not been able to find is a standard property for cardinality…e.g. “RDF:cardinality” whose domain is the full set of properties, and whose range is two values: “one to one” and “one to many”. I found this w3c writeup about it: https://www.w3.org/TR/swbp-n-aryRelations, and maybe I’m blind but I don’t see a final concrete set of recommended terms along these lines. Do you know of a standard owl or rdf property for this interpretation of cardinality?


#14

OWL definitely has a notion of cardinality:
https://www.w3.org/TR/2004/REC-owl-ref-20040210/#CardinalityRestriction


#15

Looks like OWL’s cardinality will allow expressing “one-to-one” and “one-to-many”, but with integer ranges.

owl:minCardinality a rdf:Property ;
rdfs:label “minCardinality” ;
rdfs:comment “The property that determines the cardinality of a minimum cardinality restriction.” ;
rdfs:domain owl:Restriction ;
rdfs:isDefinedBy http://www.w3.org/2002/07/owl# ;
rdfs:range xsd:nonNegativeInteger .

owl:qualifiedCardinality a rdf:Property ;
rdfs:label “qualifiedCardinality” ;
rdfs:comment “The property that determines the cardinality of an exact qualified cardinality restriction.” ;
rdfs:domain owl:Restriction ;
rdfs:isDefinedBy http://www.w3.org/2002/07/owl# ;
rdfs:range xsd:nonNegativeInteger .

owl:maxCardinality a rdf:Property ;
rdfs:label “maxCardinality” ;
rdfs:comment “The property that determines the cardinality of a maximum cardinality restriction.” ;
rdfs:domain owl:Restriction ;
rdfs:isDefinedBy http://www.w3.org/2002/07/owl# ;
rdfs:range xsd:nonNegativeInteger .

So I guess mapping the properties “one-to-one” and “one-to-many” to the standard owl properties would look like:

one-to-one (not nullable): owl:qualifiedCardinality = 1
one-to-one (nullable): owl:maxCardinality = 1
one-to-many (not nullable): owl:minCardinality = 1
one-to-many (nullable): owl:minCardinality = 0

I think working with integers is a little unintuitive for a high-level library, so maybe we can create local types “one-to-one” and “one-to-many” that when taken with the nullable property will map to the above?


#16

@dennwc and @robertmeta,

I took a whack at a set of requirements for schema manipulation for this new high-level library. I’m trying to keep the library as small as possible while also accommodating the majority (90%+) of use cases for creating an RDF graph in Cayley. I have left the data manipulation and implementation sections blank for now…since it seems like the schema manipulation requirements are a precursor to those. Please let me know what feedback you may have so far.

@dennwc, I decided not to create a special concept for “relation” since those can be handled by the rdf:range property. As previously defined, a “relation” will have an rdf:range that is a class, while a “property” will have an rdf:range that is a literal. Also, Re: cardinality… instead of introducing the concepts “oneToOne” and “oneToMany” for now, I just included owl:minCardinality and owl:maxCardinality. These two are enough to describe whether a property is nullable (minCardinality = 0) and whether a property is one-to-one or one-to-many. It has the added benefit of allowing more than one occurrence of a property, but putting an upper limit on it rather than stating simply that it is “one-to-many”.

High level cayley schema library

Purpose

Provide a minimal, high level Go library for creating, modifying, and accessing RDF graphs in Cayley.

Requirements

Schema manipulation

  1. Create and modify RDF classes (rdfs:class). ALL Classes have properties:
    1. rdfs:label - a human-readable version of the class’ name
    2. rdfs:comment - a human-readable description of the class
  2. Create and modify properties (rdf:property), which can in turn be associated with classes. These are properties other than rdfs:label and rdfs:comment, which are automatically associated with all classes. Properties have properties:
    1. rdfs:domain - equals the class (rdf:type) with which the property is being associated. The domain is the set of rdf:type which can act as the subject in a statement (rdf:statement) for which the property acts as a predicate.
    2. rdf:range - equals the classes (rdf:type), or in the event the range is of type rdf:literal, the data types (rdfs:datatype) which can act as the object in a statement for which the property acts as a predicate.
    3. rdfs:label - a human-readable version of the property’s name
    4. rdfs:comment - a human-readable description of the property
    5. owl:maxCardinality - the maximum number of times the property can be used for a given subject. NOTE: may be mapped to from more user friendly attribute.
    6. owl:minCardinality - describes the minimum number of times the property can be used for a given subject. NOTE: may be mapped to from more user friendly attribute.

Data Manipulation

  1. Create and Modify Class Instances (Entities)
    1. Provide “create”, “update”, and “create or update” functions
    2. Decision point: allow user specification of IRIs, or automatically assign a UUID and allow the user to specify only the rdfs:label and rdfs:comment of the entity?
  2. Set and Modify Entity Property Values
    1. For properties with literal ranges, the user will pass a literal to the set prop function.
    2. For properties with class ranges, the user will pass an IRI to the set prop function.

Data Access

Note1: These high level query function should provide a set of methods that allow retrieving data without using Gizmo or GraphQL. In this way, these methods are a complement to existing Cayley query functionality and simply provide high-level wrappers

Note2: Minimal requirement for query results is to return JSON. More discussion is needed to lay out exactly how queries using this high-level library will output results that can be output as go structs and used with the ORM library.

  1. Return complete list of classes defined in the schema
    1. Will include class rdfs:label and class rdfs:comment
    2. Will include only the IRI and rdfs:label value for each property associated with the class
  2. Return a complete list of properties defined in the schema
    1. Will include all of the property’s properties (not only rdfs:label)
  3. Return a complete list of all instances of a given class.
    1. To reduce data size of the result set, only IRI, label, and comment of each class will be returned.
  4. Given an IRI, return a class instance
    1. Include all class properties
    2. Include all properties of properties up to a user-specified max level. At the last level, literal property values will be included in the result, and for non-literal property values, the IRI will be included in the result.

Implementation

Library Functions

(To be drafted)

Edit Dates

  • 0.1 - Dec 31, 2016
  • 0.2 - Jan 3, 2017

#17

you guys want to take a look at this -> https://tinkerpop.apache.org/providers.html