Cayley issues


#1

We are making this topic as a community effort to define and prioritize issues that current and new users face while using Cayley, and to prioritize such issues.

Everyone is encouraged to participate!

This far, we have the following list:

  • Full support for few production storage engines: GAE, DynamoDB, ElasticSearch, SQL, etc.
  • Better beginners docs, including link from cayley.io and a blog post that walks thru a few examples.
  • Faster batch loader for nquads and other formats.
  • Visualization. Allow to visualize nodes, query plans, create and clone nodes, link them, etc.
  • We need at least one standard query language (or subset of it).

Anything else we want to add?

When we define all current issues, I will update this post and make a poll to prioritize them. To get a rough estimate, please order your proposals from high priority to low.


Passing in a map[string]string through Gizmo
#2

Great post @dennwc, I wanted to put the full support for production storage engines so I vote for that. Of course also documentation and query language.

The way how I see it; Bolt as a storage backend is super nice, but when you want people to have it up and running in no time it might be easier to do that in the cloud and know that it scales more easy than on a bare-metal server. That gives Cayley also more room for adoption and growth. The more companies use Cayley in production, the more community support and documentation there will come (assumption I know, but I’m willing to work on that when I use it on a daily basis).

If the new abstraction layer for KV is done, I think it is best for the project to implement Google Datastore and DynamoDB so that it runs on bare-metal and the cloud.

From there we make documentation and examples for those architectures and then scale up visualisation, documentation and other backends.

I’m willing to put time in the Google Datastore and documentation. So if you have the KV backend, let me know and we can maybe work together for the first AWS and GCP backend?


#3

it would be great to have bulk data loader for existing big data sets.


#4

Totally agree!

KV is tracked in #626, and should be merged quite soon. I optimized it to work with remote backends as well, but there is more work to be done.

And I hope to start working on a new Mongo/AWS/GCP abstraction in the near term. It will be probably rewritten from scratch, having custom quad metadata, links-on-links and custom indexes support in mind. Any help and//or testing is welcome, as always :slight_smile:


#5

Any specific formats in mind? Or just a fast batch loader for nquads?


#6

fast batch loader for nquads would be super! i think people like me would like start with their existing big data by just converting it to nquads and load to see it.


#7

I vote for documenting how to use BoltDB in production. Maybe it’s not just about documentation but also features that should be part of Cayley? I am not sure.

Example for such docs - how to do back it up? how to restore it as quick as possible? etc

My second vote would be for using PostgreSQL - docs/code sample etc.


#8

I’m surprised that things like aggregation and sorting was not mentioned yet. But if it’s not an issue for anyone, it’s great :grin:


#9

I have an alternative use-case for Cayley that I would like to see adopted into the main project.

I’m using Cayley entirely in the browser by transpiling to JS using GopherJS, storing the graph in “memstore”.

It works surprisingly well, with my largest graph so far having 325k quads and using 1.85Gb of memory (including browser overhead) on Safari.

The generated minified JS for the app I’m developing is 2.8Mb (482Kb gzipped). To get it that “small” I had to remove the “net” dependency from “pborman/uuid”. This was trivial, see node.go/node_js.go at https://github.com/elliott5/uuid/tree/gopherjs .

A future GopherJS-specific version of Cayley could use of the standard browser data-storage options, or have an online/offline capability by using https://pouchdb.com/ for example.

I hope extending the use of Cayley into the browser environment will be of interest to the community.

In terms of building examples to recruit new users, using this approach has the great advantage of not requiring a server.

It might also lead to better graph visualisation capabilities (maybe using D3).

To better demonstrate the possibilities, I’ve adapted the trivial example code for “using Cayley as a library” to show it being used in JS. Actually I only needed to add 2 lines, and them only to produce a pop-up “Alert” when you visit the page! You can see it at https://elliott5.github.io/cayleyjs/ - the Go code lines marked ***** are those of interest. I’ve also included the generated JS directly into the HTML file, so that you can see what is going on.


Running Cayley in the browser
#10

This is indeed quite interesting! How fast the queries are in this environment?

It would be great if you could contribute the build/minification part into the main project. And we can switch from “pborman/uuid” to avoid dependency on “net” package, if it makes things easier.

As one of possible applications, we can stream replication log from Cayley server to the browser, so it remains in sync.

Data can also be stored in Local Storage or Indexed Storage in the browser, so it’s persisted between sessions. It is now easy to implement a new KV-like backend, so it will not require too much time or effort to make a JS-specific implementation under “gopherjs” build tag.

So yeah, if you are interested in continuing this, let’s make a separate thread. There might be even more ideas over time :slight_smile:


#11

I’m afraid I don’t have any query benchmarks for JS, that work would need to be done.

I’m happy to contribute whatever code may help to get things going.

Using the replication log to keep browser & server graphs in sync sounds a very interesting idea indeed. To enable offline operation, it would need to work both ways. And, as you say, there are already KV-like stores in the browser environment. For information, GopherJS uses the “js” build tag, and you can test if you’re running in JS by using runtime.GOARCH==“js”.

So yes, please start a separate thread… :slight_smile: