pickle->zlib->shelve

Wed Oct 20 06:55:34 EDT 1999

Thomas Weholt writes:

 > I was thinking maybe I could use pickle and compress the pickled object
 > too, before storing it into the database and save some space. Pickled
 > objects also seem to have lots of repetitive data.

If you zlib-compress each pickled object, you're going to get some
savings in space, due to "internal redundancy" (i.e. low entropy) in
the objects themselves; but it seems to me that there is also
redundancy *between* the objects; how to exploit this is not at all
clear to me however.

It would be nifty if you could dynamically compress/uncompress the
whole database file, e.g. using some kind of "compressed file system",
but this presents difficultes - AFAIK there is some limited
"compressed file system" support for Linux but it's read-only, due to
the obvious difficulties of positioning and writing into the middle of
an existing compressed block.  Also "mmap" support is tricky...

I also want to point out that your database files may not be as big as
you think they are.  Gdbm files are typically sparse files, that is,
they don't take up as much space on disk as you would assume from "ls
-l", because these files contain "holes".

 > Awaiting flames and harsh words of discouragement,

Sorry to dissapoint you!