shelve vs pickle
hungjunglu at yahoo.com
hungjunglu at yahoo.com
Fri Sep 14 16:28:50 EDT 2001
Hi,
I've read a few articles in the mailing list, but it is not apparent
to me the advantages of shelve over pickle. (These are two modules in
Python.)
(1) Shelve does not write to disk immediately, at least in Windows
platform. So, if you are storing things to the shelve, you are often
doing it in RAM, not to the disk. And if your program crashes before
you close the shelve, all changes are gone. Which may or may not be
what you want. For a good database, you choose when and how you
commit transactions. For shelve, can you choose the transactional
behavior? (So that it saves things more often?)
(2) Shelve, then, very often stores things just in RAM. I don't know
the details, but I guess that if the file is large (10GB? 100GB?), it
may be smart not to load everything into RAM.
(3) Or is the advantage of Shelve in the open() and close()
statements? Is it smart enough to save only those items that have
been modified? So that open() and close() are fast enough if only a
few items have been touched?
(4) In another test, I closed the Python program while it was in a
loop writing items to shelve. The database file got corrupted enough
that the majority of items are now missing.
All in all, I am just wondering, just like some other people have
asked before in the newsgroup (and not getting any real answer): when
is it good to use shelve?
I guess shelve can only be used safely if extra caution is taken into
account:
(1) Before opening a shelve file, make sure of making a back up copy.
(This can be done at the moment of saving, too, depends on each
person's preference.)
(2) Delete the back up copy only if the shelve has been closed
successfully.
(3) Close the shelve file often, to make sure that your changes are
recorded.
However, you folks can realize a contradiction here: in order to use
shelve safely, you have to copy the big external file everytime you
want to open (or save) it. The benefit of fast opening and fast
closing is gone.
So, I still don't see any benefit from shelve. I can only see that to
do things safely I would use:
(1) a transactional database, or
(2) pickle or xml my Python objects in small units, so that I can
safely access them. ("safely accessing" means the standard procedure
of keeping a backup copy before overriding the file, and use file
renaming schemes to minimize the time of possible data inconsistency
if the program crashes unexpectly, for instance, due power failure.)
And I just don't see when to use shelve. If I have 1 million items to
store, I wouldn't use shelve: I'd use transactional database. If I
have 100 items to save, I wouldn't use shelve: I just use pickle.
There is a small range of applicability of shelve: in situations
where you: (1) have many items, (2) need to modify many things, but
in a transactional fashion, typically spending 5 minutes to an hour
in the process. (3) Don't mind losing the changes if the transaction
is not completed. I guess you can use shelve for report logging, for
non-critical web session data, etc.
regards,
Hung Jung
More information about the Python-list
mailing list