Published by Ostap Cherkashin on 2011 02 13.
This is a very brief update on the v1 performance. Here are the figures from exactly the same laptop as in the previous post (and again with the -Os gcc option):
$ cd bandicoot
$ ctl pack
$ bin/test/perf/relation
write 1000 tuples in 1ms
read 1000 tuples in 0ms
write 10000 tuples in 2ms
read 10000 tuples in 2ms
write 100000 tuples in 18ms
read 100000 tuples in 21ms
write 1000000 tuples in 175ms
read 1000000 tuples in 212ms
join 1000x1000 tuples in 2ms
join 10000x10000 tuples in 16ms
join 100000x100000 tuples in 296ms
join 1000000x1000000 tuples in 4983ms
semidiff 1000x1000 tuples (res=500 tuples) in 1ms
semidiff 10000x10000 tuples (res=5000 tuples) in 13ms
semidiff 100000x100000 tuples (res=50000 tuples) in 263ms
semidiff 1000000x1000000 tuples (res=500000 tuples) in 4743ms
project 1000 tuples from 1000 in 1ms
project 10000 tuples from 10000 in 13ms
project 100000 tuples from 100000 in 222ms
project 1000000 tuples from 1000000 in 3821ms
select 1 tuples from 1000 in 0ms
select 1 tuples from 10000 in 2ms
select 1 tuples from 100000 in 15ms
select 1 tuples from 1000000 in 154ms
union 1000x1000 tuples (res=1500 tuples) in 1ms
union 10000x10000 tuples (res=15000 tuples) in 17ms
union 100000x100000 tuples (res=150000 tuples) in 338ms
union 1000000x1000000 tuples (res=1500000 tuples) in 5835ms
extend 1000 tuples in 0ms
extend 10000 tuples in 4ms
extend 100000 tuples in 43ms
extend 1000000 tuples in 502ms
sum 1000x1000 tuples in 1ms
sum 10000x10000 tuples in 23ms
sum 100000x100000 tuples in 362ms
sum 1000000x1000000 tuples in 5710ms
eq 1000x1000 tuples in 0ms
eq 10000x10000 tuples in 14ms
eq 100000x100000 tuples in 247ms
eq 1000000x1000000 tuples in 4551ms
The test cases also got extended a little bit, and finally the Bandicoot grew up from 10000 tuples per argument ;-) Of course there are still many things which could be improved further, but it is a big step forward from where we were before. Here are the new formulas (#n stands for the number of tuples in the relation body):
Operator | Sample Expression | Complexity(*) |
---|---|---|
join | m * n | #m log #m + #n log #m |
minus (semidiff) | m - n | #n log #n + #m log #n |
union | m + n | #n log #n + #m log #n + #m |
select | n select(x < 0) | #n |
extend | n extend(x = 1) | #n |
project | n project(x, y) | 2 * #n log #n |
summary | (m, n) summary(x = max(y, 0)) | #m log #m + #n log #m |
(*)
I tried to keep the formulas closer to the actual execution effort and
hence I do not use the asymptotic notation here.