Performance of the Bandicoot v1

Published by Ostap Cherkashin on 2011 02 13.

This is a very brief update on the v1 performance. Here are the figures from exactly the same laptop as in the previous post (and again with the -Os gcc option):

$ cd bandicoot
$ ctl pack
$ bin/test/perf/relation
   write 1000 tuples in 1ms
    read 1000 tuples in 0ms
   write 10000 tuples in 2ms
    read 10000 tuples in 2ms
   write 100000 tuples in 18ms
    read 100000 tuples in 21ms
   write 1000000 tuples in 175ms
    read 1000000 tuples in 212ms
    join 1000x1000 tuples in 2ms
    join 10000x10000 tuples in 16ms
    join 100000x100000 tuples in 296ms
    join 1000000x1000000 tuples in 4983ms
semidiff 1000x1000 tuples (res=500 tuples) in 1ms
semidiff 10000x10000 tuples (res=5000 tuples) in 13ms
semidiff 100000x100000 tuples (res=50000 tuples) in 263ms
semidiff 1000000x1000000 tuples (res=500000 tuples) in 4743ms
 project 1000 tuples from 1000 in 1ms
 project 10000 tuples from 10000 in 13ms
 project 100000 tuples from 100000 in 222ms
 project 1000000 tuples from 1000000 in 3821ms
  select 1 tuples from 1000 in 0ms
  select 1 tuples from 10000 in 2ms
  select 1 tuples from 100000 in 15ms
  select 1 tuples from 1000000 in 154ms
   union 1000x1000 tuples (res=1500 tuples) in 1ms
   union 10000x10000 tuples (res=15000 tuples) in 17ms
   union 100000x100000 tuples (res=150000 tuples) in 338ms
   union 1000000x1000000 tuples (res=1500000 tuples) in 5835ms
  extend 1000 tuples in 0ms
  extend 10000 tuples in 4ms
  extend 100000 tuples in 43ms
  extend 1000000 tuples in 502ms
     sum 1000x1000 tuples in 1ms
     sum 10000x10000 tuples in 23ms
     sum 100000x100000 tuples in 362ms
     sum 1000000x1000000 tuples in 5710ms
      eq 1000x1000 tuples in 0ms
      eq 10000x10000 tuples in 14ms
      eq 100000x100000 tuples in 247ms
      eq 1000000x1000000 tuples in 4551ms

The test cases also got extended a little bit, and finally the Bandicoot grew up from 10000 tuples per argument ;-) Of course there are still many things which could be improved further, but it is a big step forward from where we were before. Here are the new formulas (#n stands for the number of tuples in the relation body):

Operator	Sample Expression	Complexity(*)
join	m * n	#m log #m + #n log #m
minus (semidiff)	m - n	#n log #n + #m log #n
union	m + n	#n log #n + #m log #n + #m
select	n select(x < 0)	#n
extend	n extend(x = 1)	#n
project	n project(x, y)	2 * #n log #n
summary	(m, n) summary(x = max(y, 0))	#m log #m + #n log #m

(*) I tried to keep the formulas closer to the actual execution effort and hence I do not use the asymptotic notation here.