Why does shelve make such large files?

Gerrit Holl Gerrit.Holl at p98.f112.n480.z2.fidonet.org
Fri Jul 2 10:31:52 EDT 1999


From: Gerrit Holl <gerrit.holl at pobox.com>

On Thu, Jul 01, 1999 at 10:47:57PM +0000, Ovidiu Predescu wrote:
> Gerrit Holl wrote:
> 
> > Is it really necesarry for shelve to make such large files?
> > Have a look at this:
> > /tmp> python
> > Python 1.5.2 (#1, Apr 18 1999, 00:16:12)  [GCC 2.7.2.3] on linux2
> > Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam
> > >>> import shelve
> > >>> d = shelve.open('database')
> > >>> d['key'] = 'value'
> > >>>
> > /tmp> ls -l database
> > -rw-rw-r--   1 gerrit   gerrit      16384 Jul  1 21:13 database
> >                                     ^^^^^
> > 
> > 16 KB for only one key!!?
> > pickle seems to make _much_ smaller files!
> > 
> > Why is this?
> 
> The shelve module uses DBM which is like a small database that allows       
<
> you to store objects and search it for a given key. DBM allows you to       
<
> store millions of objects and search for them later without requiring       
<
> you to load all the file in memory first.                                   <
> 

Interesting...

> The pickle module on the other hand is serializing the objects with the
> purpose of deserializing them _all_ from the file later. Pickle does not
> offer you any way to search for data based on a key, you have to do this
> yourself after the objects have been created from the file. This is
> opposed to the way shelve handles this, all the key accesses and
> insertions in a shelve object are actually reads or writes to or from
> the DBM file.
> 
> And to answer your question, DBM is creating these big files because of
> the way it manages the database. The data in the database file could
> have gaps as a result of multiple insertions and deletions. Pickle's
> data in files is a simple representation of the objects that were
> written and there is no way to update the file other than rewriting it
> entirely.
> 

Ah, I understand.
So pickle is useful for very small datases, but when they're really huge, one
should use shelve. Isn't it?

regards,
Gerrit.

-- 
The Dutch Linuxgames homepage:	http://linuxgames.nl.linux.org
Personal homepage:		http://www.nl.linux.org/~gerrit/

Discoverb is a python program (in several languages) which tests the words you
learned by asking it. Homepage: http://www.nl.linux.org/~gerrit/discoverb/
Oh my god! They killed init! You bastards!





More information about the Python-list mailing list