Vectorized laziness inside
bearophileHUGS at lycos.com
Thu Sep 10 17:44:14 CEST 2009
I've just seen this good Google Talk video from Jan 2008, "MonetDB/
X100: a (very) fast column-store", about a more efficient column-
The efficiency of this DBMS (something like 50 times faster) is
produced by few things:
- It's column-wise, this helps in several other successive
optimizations too (row-oriented DBMS have their purpose still, the
column-oriented are good if you want to perform statistics on most of
your data, certain kinds of data mining, etc).
- At 13.57 it shows that instead of yielding single tuples (or single
items, it's column-oriented), it yields arrays of about 100 tuples/
fields. This allows to create primitives that are much more efficient.
And the CPU can process them better, using SSE instruction too. Such
small arrays are designed to fit in the CPU cache (probably L2). Such
vectorized operations are also pipelined in some way.
- The filtering operations often don't produce new vectors, they just
mark the tuples as not present any more inside an array. This helps
avoid many copies of such arrays.
- Data is kept compressed on disk, the compression is column-wise, and
decompression is done only just-in-time to save data transfers. The
number compression takes in account that often data is sorted, so it's
delta-compressed, and then such delta is compressed only in a range of
the Gaussian-like residual distribution (outliers are encoded
directly). This compression also allows to keep large indexes in RAM,
that speeds up things more.
- They even shown vectorized hashing, but I have not understood how
- The reading from the disks is done in a merged way, to avoid reading
the same things many times for similar queries.
(The good thing is that it's not hard to understand most things shown
in this video. But I'd like to see the C code they use as "reference",
that's just 3 times faster than their DBMS).
DBMS inside work as the lazy operations that are getting more common
in Python code (see itertools), and common in Haskell.
So to reduce the Python interpreter overhead of lazy iterations it may
be used a vectorized strategy (that's meant to be almost transparent
for the Python programmer, so this is an implementation thing), so
items can be produced and moved in groups, inside arrays, like in that
video. (The DBMS too has an interpreter overhead, it's made of fast
primitives used by lazy interpreted code, it looks a lot like a Python/
Ruby :-) ). Beside CPython This idea can be useful also for Psyco/
Lazy filtering is a bit less common in Python code, so I don't know if
it can also be useful the idea of not copying new arrays but just
marking items as absent (for example with a bit array attached to the
As you may know I have introduced laziness in the D language, D2 now
has a standard library that contains several of the things present in
the itertools module. I'll do some experiments to see if the ideas of
such vectorized laziness can improve lazy generators in D. I may even
test the idea of keeping arrays with holes.
More information about the Python-list