Cayley vs. Dgraph benchmarks


#1

Has anyone checked out the Dgraph vs Cayley benchmarks published here?:

I saw it in this post:

I am wondering if the choice of Cayley back-end or other factors would affect the numbers.


#2

Branch with reification will be much faster. Also, if we finish our GraphQL implementation, it will be a bit faster as well.


#3

Also, @mrjn, I’m not sure I understand the results of the query benchmark (below). I see the ratio you computed is based on the ns/op metric, but were these two queries identical? I am wondering why the Cayley query returned in 1.3s vs Dgraph’s 2s:

Cayley
BenchmarkQueryFilmByDirector-4 20 64747834 ns/op
PASS
ok _/Users/ankuryadav/dev/benchmark/graphdb-benchmark/cayley 1.376s

Dgraph
BenchmarkQueryFilmByDirector-4 1000 1764900 ns/op
PASS
ok _/Users/ankuryadav/dev/benchmark/graphdb-benchmark/dgraph 2.067s


#4

This is because Go test framework runs benchmarks in few steps. First it runs function one time to get initial timing, and then increases iterations count. As you can see, benchmark tested Cayley version up to 20 iteration, and Dgraph version 1000 times. Higher time in Dgraph only means that Go spent more time re-running benchmarks for higher iterations count. The real values are ns/op.


#5

Ok, makes sense, thanks!


#6

wow I’m really looking forward to this! I’d love to have a query language that is cleaner and closer to other languages’ data structures such as JSON or arrays. Much easier to work with, instead of writing JavaScript strings in my code.
Would GraphQL also mean that CayLey will no longer require a JavaScript parser? If this is the case, I consider this a huge improvement.


#7

Yes, we will not require JS VM for these queries. And it works already. Just need to discuss few spec changes and implement them.


#8

I’ll also point out that it’s hardly an apples-to-apples query comparison. The benchmark code is doing a lot in the JS environment that it doesn’t need to do.

For a much closer comparison you could run today (before GraphQL), this is actually a great query with MQL.

{
	me(_xid_: m.06pj8) {
		type.object.name.en
		film.director.film  {
		film.film.genre {
			type.object.name.en
		}
		type.object.name.en
		film.film.initial_release_date
		}
	}
}

Would become (roughly, there’s a fix that we need)

[{
  "id": "<m.06pj8>",
  "<type.object.name>": null,
  "<film.director.film>": [{
    "<film.film.genre>": {
      "<type.object.name>": null
    },
    "<type.object.name>": null,
    "<film.film.initial_release_date>": null
  }]
}]

You’ll notice the similarity. It’s why I claim that MQL was ahead of its time (but hamstrung by the restrictions of must-be-JSON)

EDIT: And there’s a reproducable optimization bug. :slight_smile:


#9

As I mentioned in Dgraph discourse, I’m more than happy to accept PRs to optimize Cayley query. Note that Dgraph contributors aren’t the best folks to build an optimized Cayley query. They could use your help to represent Cayley in the best form.

So, feel free to tell us what’s the best language to use for querying Cayley (Gremlin vs MQL), and how best to represent these queries in the chosen language.

Also, the ones here: https://wiki.dgraph.io/Get_Started