Why does shelve make such large files?

Johan Wouters johanw at easics.be
Fri Jul 2 11:45:13 EDT 1999


<snip>
> Ahem, with all due respect to everyone... this is humbug.
> 

Why so?

> Pickle serializes your data in one sweep to/from disk.  It's compact.

So? I don't remember me having any trouble with that statement. As a matter
of fact this is just what I stated.


> But the last claim is horrendously misleading, I'm afraid: data storage
> is not a space/time tradeoff.  It's about throughput (I/O bottlenecks)
> and overhead in managing the supported data and indexing schemes.  There
> are order-of-magnitude performance differences in how several solutions
> work, because of this.

The general idea was some typical engineering tradeoff. Larger database
environments *might* give you better performance for some criteria like
searching or combining data. Say you have an amount of data you want to 
search. You could keep it as a serial stream (minimal space) or you could
use some clever hash method and fixed size records to gain speed. This
way you can easily bypass the overhead of accessing the whole database
(and the slow disks and, ...) by simply calculating where the data should
be. On the other hand, this structuring of the data will incur some extra
space. It looks as if this is just the space/time tradeoff I was talking about ...


> 
> Shudder.  The notion that a large database package, or a large datafile,
> is faster, is so far from reality that it has to be corrected, even in
> this Python-oriented newsgroup.  My apologies for the S/N ratio drop.

I was talking about the datafile, not the package! Also I never stated
that bigger files WILL give you better performance.

Maybe the whole story was a little simplistic, but with all respect: how
does your contribution add to the S?

So Gerrit, I hope you still got some insight from all this!

Kind regards,
Johan Wouters




More information about the Python-list mailing list