[Patches] [ python-Patches-553171 ] optionally make shelve less surprising

SourceForge.net noreply@sourceforge.net
Sat, 19 Apr 2003 12:54:55 -0700


Patches item #553171, was opened at 2002-05-07 03:13
Message generated for change (Settings changed) made by rhettinger
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: Accepted
Priority: 7
Submitted By: Alex Martelli (aleax)
>Assigned to: Raymond Hettinger (rhettinger)
Summary: optionally make shelve less surprising

Initial Comment:
shelve has highly surprising behavior wrt modifiable
values:
    s = shelve.open('she.dat','c')
    s['ciao'] = range(3)
    s['ciao'].append(4)   # doesn't "TAKE"!

Explaining to beginners that s['ciao'] is returning a
temporary object and the modification is done on the
temporary thus "silently ignored" is hard indeed.  It
also makes shelve far less convenient than it could 
be (whenever modifiable values must be shelved).

Having s keep track of all values it has returned may
perhaps break some existing program (due to extra 
memory consumption and/or to lack of "implicit 
copy"/"snapshot" behavior) so I've made the 'caching' 
change optional and by default off.  However it's now 
at least possible to obtain nonsurprising behavior:
    s = shelve.open('she.dat','c',smart=1)
    s['ciao'] = range(3)
    s['ciao'].append(4)   # no surprises any more

I suspect the 'smart=1' should be made the default, 
but, if we at least put it in now, then perhaps we 
can migrate to having it as the default very slowly 
and gradually.


Alex


----------------------------------------------------------------------

Comment By: Guido van Rossum (gvanrossum)
Date: 2003-04-19 13:51

Message:
Logged In: YES 
user_id=6380

Looks good to me. Martin or Raymond, can you check it in?
(Or should we just give Alex commit perms?)

One unrelated note: the 'binary' parameter should really be
upgraded to 'protocol' to match changes to the pickle module.

----------------------------------------------------------------------

Comment By: Alex Martelli (aleax)
Date: 2003-04-19 13:38

Message:
Logged In: YES 
user_id=60314

done -- uploading the patch to code, test and docs, with the 
optional parameter renamed to writeback; docs specifying the 
issue, possible workaround w/o writeback, and writeback's 
performance-hit risks; and following raymond's suggestions.
Not sure what "a NEWS entry" as martin requires should be a 
patch TO, though...?  So I didn't do that yet -- lemme know!!!


----------------------------------------------------------------------

Comment By: Martin v. L÷wis (loewis)
Date: 2003-03-31 00:15

Message:
Logged In: YES 
user_id=21627

dannu's code is currently unavailable... I see no reason to
add yet another layer of indirection, and no other
application of such a wrapper within Python.

The trickiest aspect of this educational: If the default
behaviour does not change (as it shouldn't), how can
unsuspecting users avoid running into the trap? So this is
much more a documentation problem than a code problem.

----------------------------------------------------------------------

Comment By: Alex Martelli (aleax)
Date: 2003-03-30 16:59

Message:
Logged In: YES 
user_id=60314

sure, but along what lines -- my previous patch's, or dannu's?  let 
me know, and I'll get to work on it as soon as I'm back from 
Python-UK & short following trip (i..e around Apr 12)


----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2003-03-30 16:56

Message:
Logged In: YES 
user_id=80475

The issue has arisen a couple of times
of comp.lang.python.
I think this patch would be helpful.


----------------------------------------------------------------------

Comment By: Martin v. L÷wis (loewis)
Date: 2003-03-30 16:43

Message:
Logged In: YES 
user_id=21627

Would you then be willing to provide a complete patch
(documentation, NEWS entry, test case)?

----------------------------------------------------------------------

Comment By: Alex Martelli (aleax)
Date: 2003-03-30 16:26

Message:
Logged In: YES 
user_id=60314

Yes, Martin, I'm still quite convinced shelve's behavior is
generally surprising and often problematic, and even though
the fixed suggested by both me and dannu are each imperfect
(given the impossibility to find out, in general, whether an object
has been modified), I think one or the other would still be better
than the current situation.

----------------------------------------------------------------------

Comment By: Martin v. L÷wis (loewis)
Date: 2003-03-30 16:18

Message:
Logged In: YES 
user_id=21627

Alex, do you still think this should be implemented, in some
form or  other?

----------------------------------------------------------------------

Comment By: Holger P. Krekel (dannu)
Date: 2002-05-09 19:47

Message:
Logged In: YES 
user_id=83092

I'd suggest not changing shelve at all but providing 
a "cache-commit" dictionary (ccdict) which can wrap a
shelf-instance (or any other simple dictish instance)
and provides the 'non-surprising' behaviour. 

Some proof of concept code for the following
properties is provided here

http://home.trillke.net/~hpk/ccdict.py

Current properties are:

- ccdict wraps a dictionary-like object which
  in turn only needs to provide
  __getitem__, __setitem__, __delitem__,has_key

- on first access of an element
  ccdict makes a lookup on the underlying
  dict and caches the item.
- the next accesses work with the cached thing.
  Unsurprising dict-semantics are provided.

- deleting an item is deferred and actually happens
  on commit() time. deleting an item and later on
  assigning to it works as expected (i.e. the assignment
  takes preference).

- commit() transfers the items in the
  cache to the underlying dict and clears
  the cache.Prior to issuing commit
  no writeback to the underlying dict happens.

- deleting an ccdict-instance does *not* commit any  
changes. You have to explicitely call commit().
If you want to work readonly, don't call commit.

- clear() only cleares the cache and not the underlying
  dict 

- you can explicitely prune the cache (via cache.keys()
etc.) before calling commit(). This lets you
avoid writing back unmodified objects if this
is an issue.

It seems quite impossible to figure out automagically
which objects have been modified 
and so the solution is to do it explicitely 
(or don't commit for readonly).

holger

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-09 15:55

Message:
Logged In: YES 
user_id=80475

A few more thoughts:

Please change the "except:" lines to specify the exception 
being caught.

Also, if GvR shows interest in the patch, we should update 
the library reference and add unittests.

The docstring should also mention that the cache is kept in 
memory -- besides persistence, one of the forces for 
shelving is memory conservation.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-09 13:43

Message:
Logged In: YES 
user_id=80475

Nicely done!  The code is clean and runs in the smart mode 
without problems on my existing programs. I agree that the 
patch solves a real world problem.  The solution is clean, 
but a little expensive.

If there were a way to be able to tell if an entry had been 
altered, it would save the 100% writeback.  Unfortunately, 
I can't think of a way.

The docstring could read more smoothly and plainly.  Also, 
it should be clear that the cost of setting smart=1 is that 
100% of the entries get rewritten on close.

Two microscopically minor thoughts on the coding (feel free 
to disregard). Can some of the try/except blocks be 
replaced by something akin to 'if self.smart:'?  For the 
writeback loop, consider 'for k,v in cache.iteritems()' as 
it takes less memory and saves a lookup.

----------------------------------------------------------------------

Comment By: Martin v. L÷wis (loewis)
Date: 2002-05-07 11:38

Message:
Logged In: YES 
user_id=21627

Even more important than the backwards compatibility might
be the issue that it writes back all accessed objects on
close, which might be expensive if there have been many
read-only accesses.

So I think the option name could be also 'slow'; although
'writeback' might be more technical.

Also, I wonder whether write-back should be attempted if the
shelve was opened read-only.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470