WIP: Bolt 2: Reification Boogaloo


#1

Happy Holidays all! While sitting at home with my folks I broke ground on a complete rewrite of the Bolt backend that could support our reification plan.

It’s looking like a happy holiday indeed.

First, by doing smarter indexing, the load time and final size for the 30kmovie set:

$ time ./cayley init -db bolt2 -dbpath b2                              
./cayley init -db bolt2 -dbpath b2  0.25s user 0.04s system 10% cpu 2.802 total
$ time ./cayley load -db bolt2 -dbpath b2 -quads data/30kmoviedata.nq.gz
./cayley load -db bolt2 -dbpath b2 -quads   12.66s user 0.67s system 47% cpu 28.104 total
$ time ./cayley init -db bolt -dbpath b1                           
./cayley init -db bolt -dbpath b1 0.00s user 0.01s system 8% cpu 0.074 total
$ time ./cayley load -db bolt -dbpath b1 -quads data/30kmoviedata.nq.gz
./cayley load -db bolt -dbpath b1 -quads   25.90s user 2.58s system 31% cpu 1:29.11 total
$ du b1 b2
418520K	b1
144200K	b2

But if that weren’t awesome enough, this morning I got the iterators working so we can run the integration benchmarks:

$ benchx bolt.txt bolt2.txt      
benchmark                                     old ns/op     new ns/op     delta	mult
BenchmarkNamePredicate-4                      189259        183955        -2.80%	-1.03x
BenchmarkLargeSetsNoIntersection-4            8451570       499883        -94.09%	-16.92x
BenchmarkVeryLargeSetsSmallIntersection-4     20655310      827089        -96.00%	-25.00x
BenchmarkHelplessContainsChecker-4            30609564      251305        -99.18%	-121.95x
BenchmarkHelplessNotContainsFilms-4           31724938      488023        -98.46%	-64.94x
BenchmarkHelplessNotContainsActors-4          33014335      470609        -98.57%	-69.93x
BenchmarkNetAndSpeed-4                        1204336       729711        -39.41%	-1.65x
BenchmarkKeanuAndNet-4                        784386        624080        -20.44%	-1.26x
BenchmarkKeanuAndSpeed-4                      977683        623977        -36.18%	-1.57x
BenchmarkKeanuOther-4                         5291139       825297        -84.40%	-6.41x
BenchmarkKeanuBullockOther-4                  9021841       1086818       -87.95%	-8.30x
BenchmarkSaveBogartPerformances-4             383852        243993        -36.44%	-1.57x

That can’t be right, you might think… let’s compare it to the memstore?

$ benchx memstore.txt bolt2.txt
benchmark                                     old ns/op     new ns/op     delta	mult
BenchmarkNamePredicate-4                      198414        183955        -7.29%	-1.08x
BenchmarkLargeSetsNoIntersection-4            3656702       499883        -86.33%	-7.32x
BenchmarkVeryLargeSetsSmallIntersection-4     7128319       827089        -88.40%	-8.62x
BenchmarkHelplessContainsChecker-4            2384079       251305        -89.46%	-9.49x
BenchmarkHelplessNotContainsFilms-4           750280        488023        -34.95%	-1.54x
BenchmarkHelplessNotContainsActors-4          9374474       470609        -94.98%	-19.92x
BenchmarkNetAndSpeed-4                        857375        729711        -14.89%	-1.17x
BenchmarkKeanuAndNet-4                        631091        624080        -1.11%	-1.01x
BenchmarkKeanuAndSpeed-4                      747707        623977        -16.55%	-1.20x
BenchmarkKeanuOther-4                         2514868       825297        -67.18%	-3.05x
BenchmarkKeanuBullockOther-4                  4276342       1086818       -74.59%	-3.94x
BenchmarkSaveBogartPerformances-4             333554        243993        -26.85%	-1.37x

So good news/bad news, it’s believably faster than bolt1. Bad news, it appears we have a query plan bug (exercised by HelplessNotContainsActors) because there’s no way it’s 20x faster than memory. I have no concrete reason to believe that KeanuAndSpeed isn’t alright though, which is puzzling.

I’ll finish up with deletion and (some) sameAs indexing before PRing it, but it’s shaping up nicely


New to Cayley, some questions before I start
#2

Opinion: I expect the 20-40% bump to be accurate. I highlighted the init time because it’s far longer; but that (short) one time cost helps increase our load speeds a lot.


#3

/me cracks the whip, yes, yes, faster bolt stuff.