Why does shelve make such large files?

Thu Jul 1 18:47:57 EDT 1999

From: Ovidiu Predescu <ovidiu at cup.hp.com>

Gerrit Holl wrote:

> Is it really necesarry for shelve to make such large files?
> Have a look at this:
> /tmp> python
> Python 1.5.2 (#1, Apr 18 1999, 00:16:12)  [GCC 2.7.2.3] on linux2
> Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> >>> import shelve
> >>> d = shelve.open('database')
> >>> d['key'] = 'value'
> >>>
> /tmp> ls -l database
> -rw-rw-r--   1 gerrit   gerrit      16384 Jul  1 21:13 database
>                                     ^^^^^
> 
> 16 KB for only one key!!?
> pickle seems to make _much_ smaller files!
> 
> Why is this?

The shelve module uses DBM which is like a small database that allows
you to store objects and search it for a given key. DBM allows you to
store millions of objects and search for them later without requiring
you to load all the file in memory first.

The pickle module on the other hand is serializing the objects with the
purpose of deserializing them _all_ from the file later. Pickle does not
offer you any way to search for data based on a key, you have to do this
yourself after the objects have been created from the file. This is
opposed to the way shelve handles this, all the key accesses and
insertions in a shelve object are actually reads or writes to or from
the DBM file.

And to answer your question, DBM is creating these big files because of
the way it manages the database. The data in the database file could
have gaps as a result of multiple insertions and deletions. Pickle's
data in files is a simple representation of the objects that were
written and there is no way to update the file other than rewriting it
entirely.

-- 
Ovidiu Predescu <ovidiu at cup.hp.com>
http://www.geocities.com/SiliconValley/Monitor/7464/