Performance of adding quads to Bolt db vs Memory


Hi there,
I am looking for some performance metrics for insert into different storages. Inserting into the memory store is orders faster than boltdb. Using the api. Is there any way to make this faster? Can I insert into the memorydb and then serialize the thing into bolt? Is there some other backend that is faster on inserts?


Are you batching to Bolt using a transaction?


Also, I know one person has done exactly that – built out a set of quads into memory store then pushed those quads into boltdb for a big speed improvement, I will try to get them on to talk about it.


this is a very basic code from the examples

adding here

create bolt quad store :

or create a memory graph

for my test it takes one minute to write to memory and the bolt has been running for hours.
I did not do any specific transactions, let me check.


maybe I spoke too soon, it appears faster, but we will see that the total time it takes. I think we need to profile this a bit.


So now the creating of the transaction was fast, as finding duplicates against existing data, but the commit is very slow. So it seems we are back to the beginning. I guess I will break this into chunks of some type.


There is going to be a sweet spot transaction size wise, 10k or 50k or ($X) depending on data. You might consider just ignoring duplicates (cayley option) so you can just blindly insert.


Committing every 100,000 and with a file split up into parallel chunks of 1million (ten commits) shows some promise.

Here is an example profile of time spent:

go tool pprof parser /tmp/profile869529416/cpu.pprof                                                                 
Entering interactive mode (type "help" for commands)                                                                 
(pprof) top                                                                                                          
2240ms of 3570ms total (62.75%)                                                                                      
Dropped 79 nodes (cum <= 17.85ms)                                                                                    
Showing top 10 nodes out of 150 (cum >= 110ms)                                                                       
      flat  flat%   sum%        cum   cum%                                                                           
      1220ms 34.17% 34.17%     1220ms 34.17%  runtime.memmove                                                        
       210ms  5.88% 40.06%      380ms 10.64%  runtime.mallocgc                                                       
       180ms  5.04% 45.10%      340ms  9.52%  runtime.scanobject                                                     
       120ms  3.36% 48.46%      120ms  3.36%  runtime.heapBitsForObject                                              
       100ms  2.80% 51.26%      100ms  2.80%  crypto/sha1.blockAMD64                                                 
       100ms  2.80% 54.06%      100ms  2.80%  runtime.cmpbody                                                        
       100ms  2.80% 56.86%      100ms  2.80%  runtime.procyield                                                      
        80ms  2.24% 59.10%       80ms  2.24%  runtime.memclr                                                         
        70ms  1.96% 61.06%      120ms  3.36%  runtime.writebarrierptr_nostore1                                       
        60ms  1.68% 62.75%      110ms  3.08%  runtime.greyobject


I wonder if the memmove is just resizing arrays and if we could prealloc the space it should get faster.


Look at these commit times , they are growing and growing when more data comes in. And I am creating a new transaction. Latest code is pushed. Will let this run now.

Committed post:100000 token: state:5 min:0.004628 seconds:0.277662 nanoseconds:%!f(int64=277661706)
Committed post:200000 token: state:9 min:0.004071 seconds:0.244244 nanoseconds:%!f(int64=244243771)
Committed post:300000 token: state:5 min:0.004899 seconds:0.293924 nanoseconds:%!f(int64=293924116)
Committed post:400000 token:377 state:2 min:0.004693 seconds:0.281585 nanoseconds:%!f(int64=281584525)
Committed post:500000 token: state:10 min:0.006419 seconds:0.385155 nanoseconds:%!f(int64=385154933)
Committed post:600000 token: state:11 min:0.007676 seconds:0.460543 nanoseconds:%!f(int64=460543037)
Committed post:700000 token: state:9 min:0.012592 seconds:0.755538 nanoseconds:%!f(int64=755538207)
Committed post:800000 token:@47 state:8 min:0.010412 seconds:0.624735 nanoseconds:%!f(int64=624735014)
Committed post:900000 token: state:9 min:0.012110 seconds:0.726615 nanoseconds:%!f(int64=726614785)
Committed post:1000000 token:@21 state:8 min:0.013138 seconds:0.788253 nanoseconds:%!f(int64=788253188)
Committed post:1100000 token: state:5 min:0.017031 seconds:1.021857 nanoseconds:%!f(int64=1021856930)
Committed post:1200000 token: state:9 min:0.020229 seconds:1.213753 nanoseconds:%!f(int64=1213752574)
Committed post:1300000 token:t state:6 min:0.022588 seconds:1.355263 nanoseconds:%!f(int64=1355263032)
Committed post:1400000 token:functi state:4 min:0.022581 seconds:1.354865 nanoseconds:%!f(int64=1354864522)
Committed post:1500000 token:@1 state:8 min:0.024120 seconds:1.447174 nanoseconds:%!f(int64=1447174418)
Committed post:1600000 token: state:5 min:0.028905 seconds:1.734271 nanoseconds:%!f(int64=1734270620)

Committed post:8400000 token:sc state:6 min:0.093813 seconds:5.628797 nanoseconds:%!f(int64=5628796854)
Committed post:8500000 token:con state:8 min:0.093200 seconds:5.592018 nanoseconds:%!f(int64=5592018297)
Committed post:8600000 token:s state:6 min:0.095387 seconds:5.723198 nanoseconds:%!f(int64=5723197611)
Committed post:8700000 token:@761 state:8 min:0.094035 seconds:5.642126 nanoseconds:%!f(int64=5642126380)
Committed post:8800000 token:@7694 state:8 min:0.095714 seconds:5.742823 nanoseconds:%!f(int64=5742822936)
Committed post:8900000 token: state:9 min:0.099002 seconds:5.940112 nanoseconds:%!f(int64=5940112031)
Committed post:9000000 token: state:10 min:0.099004 seconds:5.940249 nanoseconds:%!f(int64=5940248969)
Committed post:9100000 token: state:5 min:0.097196 seconds:5.831788 nanoseconds:%!f(int64=5831788413)

Committed post:10000000 token: state:10 min:0.103209 seconds:6.192569 nanoseconds:%!f(int64=6192569107)
Committed post:10100000 token: state:11 min:0.102143 seconds:6.128606 nanoseconds:%!f(int64=6128606285)
Committed post:10200000 token:@88181 state:8 min:0.100541 seconds:6.032477 nanoseconds:%!f(int64=6032477463)
Committed post:10300000 token: state:5 min:0.101405 seconds:6.084328 nanoseconds:%!f(int64=6084327928)
Committed post:10400000 token:@820 state:8 min:0.101846 seconds:6.110784 nanoseconds:%!f(int64=6110783675)

after it finished :

(pprof) top
12690ms of 27710ms total (45.80%)
Dropped 245 nodes (cum <= 138.55ms)
Showing top 10 nodes out of 168 (cum >= 600ms)
flat flat% sum% cum cum%
1920ms 6.93% 6.93% 2620ms 9.46% runtime.scanobject
1580ms 5.70% 12.63% 1580ms 5.70% runtime.memmove
1520ms 5.49% 18.12% 1520ms 5.49% runtime.cmpbody
1410ms 5.09% 23.20% 4640ms 16.74% runtime.mallocgc
1380ms 4.98% 28.18% 1460ms 5.27% syscall.Syscall6
1260ms 4.55% 32.73% 1270ms 4.58% runtime.heapBitsSetType
1130ms 4.08% 36.81% 1130ms 4.08% runtime.memclr
990ms 3.57% 40.38% 990ms 3.57% syscall.Syscall
970ms 3.50% 43.88% 970ms 3.50% crypto/sha1.blockAMD64
530ms 1.91% 45.80% 600ms 2.17% runtime.mapaccess1_fast64


Worth remember a few things about boltdb. It is focused on read over write performance, has a single writer, and write performance will get worse as DB size grows.


Also, there is work on a new bolt backend that improves on performance.

#14 has a graph to give an idea how much boltdb lags behind some other backends in write performance. But, it is also very safe versus crashes, and very simple, but if you workload is write heavy – be aware. Also with a B-Tree is mostly needs SSDs.


So what is the fastest way to import data into cayley? Do I preformat the data and load to memory? What is recommend?


Oh I see I can load from the nq file into memory,that works for me. Thanks!


I am raising this with boltdb itself for further work


You should probably create a reproducer withOUT Cayley for bolt, as we do a lot of other work. is much faster currently.


So I have just generated a quadstore from my program and read the entire thing in when needed. This solves my problem. It should be possible to create a gob dump of the cayley graph or some other binary representation for faster loading. Right now I dont need to query anything larger than memory. is an example file. So first I want to get better with cayley and get to know the code then I can work on improving the code more.


There is a protobuf-based format for faster loading called pquads. It does a simple compression of ordered quads as well.


That is a good suggestion, the loading time should be faster with that.