Serious problem with Shelve
Rami A. Kishek
ramiak2000 at yahoo.com
Sun Aug 17 19:04:57 CEST 2003
Hi - this mysterious behavior with shelve is just about to kill me. I
hope someone here can shed some light. First of all, I have this piece
of code which uses shelve to save instances of some class I define. It
works perfectly on an old machine (PII-400) running Python 2.2.1 under
RedHat Linux 8.0. When I try to run it under Python for windows ME on a
P-4 1.4 GHz, however, it keeps crashing on reading from the shelved file
the second time I try to access it. The Windows machine was originally
running python 1.5.2, so I upgraded to 2.2.3, thinking that would solve
the problem, but it didn't!
This is what the error looks like:
tmprec = myrecs[key]
File "D:\PROGRAMS\PYTHON22\lib\shelve.py", line 70, in __getitem__
f = StringIO(self.dict[key])
Here's what my program does (it is too much code to include here).
I have 4 related modules: one containing the class definitions (in all
other modules I use from classfile import ___); the second module builds
the shelve file by parsing a large text file containing the data,
building classes; the third re-opens the file later to do reading and
writing operations; and the 4th module is a GUI controller that simple
calls the appropriate functions from the other 2 modules.
The main breakdown occurs in module 3. Significantly, I initially had
this module set up as a script in which everything was done on the
module level, and it was working fine (apparently). The problems
started appearing when I wrapped code inside functions (I need to do
that since I want to call it from other modules, and I have about 4000
lines of code altogether!). I spent painstaking hours trying to isolate
the problem - I pass the open shelve file as a parameter to all the
functions that need it, and I close it properly using try: finally
statements after every use. I also make sure all the keys that go in
there are unique.
What module 3 does is a series of short reads and writes to the shelve
file. First I test if a particular key is in there - if it is not, I
add an item, if it is, I read the existing item, update it, then write
it back like this:
tmprec = myrecs[key] # I read a particular instance from the shelve
tmprec.field = 1 # I update one field
#del myrevs[key] # Commented lines are things I tried while
myrecs[key] = tmprec # Then I write it back to the shelve file
This one function apppears to be the guilty party. When I comment it
out the crash stops. However it is a vital function for my program and
I need to do it. Note that deleting the original item before reqwriting
it helped reduce the frequency of crashes, but didn't eliminate it
completely. The other possibility (which is why I unsuccessfully tried
the .sync() lines) is that it has to do with the timing of writing to
disk. The library reference is vague about this, saying that shelve is
incapable of simultanteous reads and writes, so the file shouldn't be
opened twice for write. However it does not say whether this implies we
cannot read and write like this in quick succession.
* The first run of module 3 after creating the shelve file doesn't
crash, although I suspect it is doing something funny.
* The second time I get that error above, keeping in mind I am supposed
to have a key in there called "A_G_0863161618" (without the extra '8' at
the end), so the database is already corrupted. So the key
'A_G_08631616188' is in myshelvefile.keys(), the original is no more,
yet NEITHER can be accesed using myshelvefile[key]!
* After creation, the shelve file size is only 71 kB. After running
module 3 - which is supposed to mostly read and not really change the
file much - the size jumps to 110 kB!
* If I open the file in a text editor, I notice all sorts of things that
are not supposed to be there (like directory paths, etc), indicating it
is corrupted. I do not see those things when I open the file on the
good (Linux) machine.
* I did a scandisk to ensure the disk is OK and it is.
More information about the Python-list