Quickstart: Cayley as an Application


#1

Quickstart as Application (an overview)

Getting Started

This guide will take you through starting a persistent graph based on the provided data, with some hints for each backend.

Grab the latest release binary and extract it wherever you like.

If you prefer to build from source, see Contributing.md which has instructions.•

Initialize A Graph

Now that Cayley is downloaded (or built), let’s create our database. init is the subcommand to set up a database and the right indices.

You can set up a full configuration file if you’d prefer, but it will also work from the command line.

Examples for each backend:

  • leveldb: ./cayley init --db=leveldb --dbpath=/tmp/moviedb – where /tmp/moviedb is the path you’d like to store your data.
  • bolt: ./cayley init --db=bolt --dbpath=/tmp/moviedb – where /tmp/moviedb is the filename where you’d like to store your data.
  • mongo: ./cayley init --db=mongo --dbpath="<HOSTNAME>:<PORT>" – where HOSTNAME and PORT point to your Mongo instance.
  • sql: ./cayley init --db=sql --dbpath="postgres://[USERNAME:PASSWORD@]HOST[:PORT]/DATABASE_NAME?sslmode=disable" – where HOSTNAME, PORT and DATABASE_NAME point to your PostgreSQL database and USERNAME, PASSWORD have write access. The value of the dbpath flag is a ››› connection string of parameters from pq. See https://godoc.org/github.com/lib/pq for more information.

Those two options (db and dbpath) are always going to be present. If you feel like not repeating yourself, setting up a configuration file for your backend might be something to do now. There’s an example file, cayley.cfg.example in the root directory.

You can repeat the --db and --dbpath flags from here forward instead of the config flag, but let’s assume you created cayley.cfg.overview

Load Data Into A Graph

First we load the data.

./cayley load --config=cayley.cfg.overview --quads=data/testdata.nq

And wait. It will load. If you’d like to watch it load, you can run

./cayley load --config=cayley.cfg.overview --quads=data/testdata.nq --alsologtostderr

And watch the log output go by.

Connect a REPL To Your Graph

Now it’s loaded. We can use Cayley now to connect to the graph. As you might have guessed, that command is:

./cayley repl --config=cayley.cfg.overview

Where you’ll be given a cayley> prompt. It’s expecting Gremlin/JS, but that can also be configured with a flag.

New nodes and links can be added with the following command:

cayley> :a subject predicate object label .

Removing links works similarly:

cayley> :d subject predicate object .

This is great for testing, and ultimately also for scripting, but the real workhorse is the next step.

Go ahead and give it a try:

// Simple math
cayley> 2 + 2

// JavaScript syntax
cayley> x = 2 * 8
cayley> x

// See all the entities in this small follow graph.
cayley> graph.Vertex().All()

// See only dani.
cayley> graph.Vertex("<dani>").All()

// See who dani follows.
cayley> graph.Vertex("<dani>").Out("<follows>").All()

Serve Your Graph

Just as before:

./cayley http --config=cayley.cfg.overview

And you’ll see a message not unlike

Cayley now listening on 127.0.0.1:64210

If you visit that address (often, http://localhost:64210) you’ll see the full web interface and also have a graph ready to serve queries via the HTTP API

UI Overview

Sidebar

Along the side are the various actions or views you can take. From the top, these are:

  • Run Query (run the query)
  • Gremlin (a dropdown, to pick your query language, MPL is the other)
    • GremlinAPI.md: This is the one of the two query languages used either via the REPL or HTTP interface.
    • MQL.md: The other query language the interfaces support.•

  • Query (a request/response editor for the query language)
  • Query Shape (a visualization of the shape of the final query. Does not execute the query.)
  • Visualize (runs a query and, if tagged correctly, gives a sigmajs view of the results)
  • Write (an interface to write or remove individual quads or quad files)

  • Documentation (this documentation)

Visualize

To use the visualize function, emit, either through tags or JS post-processing, a set of JSON objects containing the keys source and target. These will be the links, and nodes will automatically be detected.

For example:

[
{
  "source": "node1"
  "target": "node2"
},
{
  "source": "node1"
  "target": "node3"
},
]

Other keys are ignored. The upshot is that if you use the “Tag” functionality to add “source” and “target” tags, you can extract and quickly view subgraphs.

// Visualize who dani follows.
g.V("<dani>").Tag("source").Out("<follows>").Tag("target").All()

The visualizer expects to tag nodes as either “source” or “target.” Your source is represented as a blue node.
While your target is represented as an orange node.
The idea being that our node relationship goes from blue to orange (source to target).


Sample Data

For more interesting test data – follow the same loading procedure as outlined above, but with “data/30kmoviedata.nq.gz”

Running some more interesting queries

The simplest query is merely to return a single vertex. Using the 30kmoviedata.nq dataset from above, let’s walk through some simple queries:

// Query all vertices in the graph, limit to the first 5 vertices found.
graph.Vertex().GetLimit(5)

// Start with only one vertex, the literal name "Humphrey Bogart", and retrieve all of them.
graph.Vertex("Humphrey Bogart").All()

// `g` and `V` are synonyms for `graph` and `Vertex` respectively, as they are quite common.
g.V("Humphrey Bogart").All()

// "Humphrey Bogart" is a name, but not an entity. Let's find the entities with this name in our dataset.
// Follow links that are pointing In to our "Humphrey Bogart" node with the predicate "name".
g.V("Humphrey Bogart").In("<name>").All()

// Notice that "name" is a generic predicate in our dataset.
// Starting with a movie gives a similar effect.
g.V("Casablanca").In("<name>").All()

// Relatedly, we can ask the reverse; all ids with the name "Casablanca"
g.V().Has("<name>", "Casablanca").All()

You may start to notice a pattern here: with Gremlin, the query lines tend to:

Start somewhere in the graph | Follow a path | Run the query with “All” or “GetLimit”

g.V(“Casablanca”) | .In("") | .All()

And these pipelines continue…

// Let's get the list of actors in the film
g.V().Has("<name>","Casablanca")
  .Out("</film/film/starring>").Out("</film/performance/actor>")
  .Out("<name>").All()

// But this is starting to get long. Let's use a morphism -- a pre-defined path stored in a variable -- as our linkage

var filmToActor = g.Morphism().Out("</film/film/starring>").Out("</film/performance/actor>")

g.V().Has("<name>", "Casablanca").Follow(filmToActor).Out("<name>").All()

There’s more in the JavaScript API Documentation, but that should give you a feel for how to walk around the graph.