[Patches] [ python-Patches-553171 ] optionally make shelve less surprising

noreply@sourceforge.net noreply@sourceforge.net
Thu, 09 May 2002 17:47:40 -0700


Patches item #553171, was opened at 2002-05-07 08:13
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470

Category: Library (Lib)
Group: Python 2.2.x
Status: Open
Resolution: None
Priority: 5
Submitted By: Alex Martelli (aleax)
Assigned to: Nobody/Anonymous (nobody)
Summary: optionally make shelve less surprising

Initial Comment:
shelve has highly surprising behavior wrt modifiable
values:
    s = shelve.open('she.dat','c')
    s['ciao'] = range(3)
    s['ciao'].append(4)   # doesn't "TAKE"!

Explaining to beginners that s['ciao'] is returning a
temporary object and the modification is done on the
temporary thus "silently ignored" is hard indeed.  It
also makes shelve far less convenient than it could 
be (whenever modifiable values must be shelved).

Having s keep track of all values it has returned may
perhaps break some existing program (due to extra 
memory consumption and/or to lack of "implicit 
copy"/"snapshot" behavior) so I've made the 'caching' 
change optional and by default off.  However it's now 
at least possible to obtain nonsurprising behavior:
    s = shelve.open('she.dat','c',smart=1)
    s['ciao'] = range(3)
    s['ciao'].append(4)   # no surprises any more

I suspect the 'smart=1' should be made the default, 
but, if we at least put it in now, then perhaps we 
can migrate to having it as the default very slowly 
and gradually.


Alex


----------------------------------------------------------------------

Comment By: H.P.K. (dannu)
Date: 2002-05-10 00:47

Message:
Logged In: YES 
user_id=83092

I'd suggest not changing shelve at all but providing 
a "cache-commit" dictionary (ccdict) which can wrap a
shelf-instance (or any other simple dictish instance)
and provides the 'non-surprising' behaviour. 

Some proof of concept code for the following
properties is provided here

http://home.trillke.net/~hpk/ccdict.py

Current properties are:

- ccdict wraps a dictionary-like object which
  in turn only needs to provide
  __getitem__, __setitem__, __delitem__,has_key

- on first access of an element
  ccdict makes a lookup on the underlying
  dict and caches the item.
- the next accesses work with the cached thing.
  Unsurprising dict-semantics are provided.

- deleting an item is deferred and actually happens
  on commit() time. deleting an item and later on
  assigning to it works as expected (i.e. the assignment
  takes preference).

- commit() transfers the items in the
  cache to the underlying dict and clears
  the cache.Prior to issuing commit
  no writeback to the underlying dict happens.

- deleting an ccdict-instance does *not* commit any  
changes. You have to explicitely call commit().
If you want to work readonly, don't call commit.

- clear() only cleares the cache and not the underlying
  dict 

- you can explicitely prune the cache (via cache.keys()
etc.) before calling commit(). This lets you
avoid writing back unmodified objects if this
is an issue.

It seems quite impossible to figure out automagically
which objects have been modified 
and so the solution is to do it explicitely 
(or don't commit for readonly).

holger

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-09 20:55

Message:
Logged In: YES 
user_id=80475

A few more thoughts:

Please change the "except:" lines to specify the exception 
being caught.

Also, if GvR shows interest in the patch, we should update 
the library reference and add unittests.

The docstring should also mention that the cache is kept in 
memory -- besides persistence, one of the forces for 
shelving is memory conservation.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-09 18:43

Message:
Logged In: YES 
user_id=80475

Nicely done!  The code is clean and runs in the smart mode 
without problems on my existing programs. I agree that the 
patch solves a real world problem.  The solution is clean, 
but a little expensive.

If there were a way to be able to tell if an entry had been 
altered, it would save the 100% writeback.  Unfortunately, 
I can't think of a way.

The docstring could read more smoothly and plainly.  Also, 
it should be clear that the cost of setting smart=1 is that 
100% of the entries get rewritten on close.

Two microscopically minor thoughts on the coding (feel free 
to disregard). Can some of the try/except blocks be 
replaced by something akin to 'if self.smart:'?  For the 
writeback loop, consider 'for k,v in cache.iteritems()' as 
it takes less memory and saves a lookup.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-05-07 16:38

Message:
Logged In: YES 
user_id=21627

Even more important than the backwards compatibility might
be the issue that it writes back all accessed objects on
close, which might be expensive if there have been many
read-only accesses.

So I think the option name could be also 'slow'; although
'writeback' might be more technical.

Also, I wonder whether write-back should be attempted if the
shelve was opened read-only.

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=305470&aid=553171&group_id=5470