Why does shelve make such large files?

Thomas S. Strinnhed Thomas.S..Strinnhed at p98.f112.n480.z2.fidonet.org
Fri Jul 2 13:13:24 EDT 1999


From: "Thomas S. Strinnhed" <thstr at serop.abb.se>

Hi

Ovidiu Predescu wrote:
>[...]
> 
> The shelve module uses DBM which is like a small database that allows
> you to store objects and search it for a given key. DBM allows you to
> store millions of objects and search for them later without requiring
> you to load all the file in memory first.
> 
> The pickle module on the other hand is serializing the objects with the
> purpose of deserializing them _all_ from the file later. Pickle does not
> offer you any way to search for data based on a key, you have to do this
> yourself after the objects have been created from the file. This is
> opposed to the way shelve handles this, all the key accesses and
> insertions in a shelve object are actually reads or writes to or from
> the DBM file.
> 
> And to answer your question, DBM is creating these big files because of
> the way it manages the database. The data in the database file could
> have gaps as a result of multiple insertions and deletions. Pickle's
> data in files is a simple representation of the objects that were
> written and there is no way to update the file other than rewriting it
> entirely.
> 
> --
> Ovidiu Predescu <ovidiu at cup.hp.com>
> http://www.geocities.com/SiliconValley/Monitor/7464/

So, just to make shure I follow, in short terms 
 * pickle makes "persistant objects" flushed to a file 
 * shelve + dbm is in fact a (relational) database storing
   (arbitary**) objects in some hash-searchable order

(**) By arbitary I mean _any_ kind of object, or do they need to be
     of the same class?? (In order for searching to work)

About usage: shelves when I need to search and pickle when I just need
to save my objects?? Are there other considerations based on number and
size of objects?

Best regards
 -- Thomas S. Strinnhed, thstr at serop.abb.se




More information about the Python-list mailing list