[Persistence-sig] "Straw Baby" Persistence API

Guido van Rossum guido@python.org
Fri, 19 Jul 2002 16:03:54 -0400


> * I do think we should keep PersistentList and PersistentMapping in the
> core; they're useful for almost any kind of application, and any kind of
> back-end storage.  They don't introduce policy or data format dependencies
> into users' code, either.

But perhaps these should be rewritten to derive from dict and list
instead of UserDict and UserList?  Also, the module names are
inconsistent -- PersistentMapping is defined in _persistentMapping.py
but PersistentList is defined in PersistentList.py.  Both are then
"pulled up" one level by __init__.py and their __module__ attribute
modified.  I find all that hideous and tricky, and I propose to clean
this up before making it a standard Python package.

> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar.  This could be
> done by making a _p_jar descriptor that read/wrote through to _p_dm, and
> issued a deprecation warning.  I don't personally have a problem with
> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's
> confusing or that they want to get rid of it.  So if we're doing it, now
> seems like the time.

It's just that "jar" makes no sense (except in the "cutesy" sense of a
jar full of pickles).  But "dm" is a little obscure too.  Maybe write
it out in full as _p_datamanager?

> * Flag _p_changed *after* __setattr__, not before!  This will help
> co-operative transaction participants play nicely together, since they
> can't "write through" a change if they're getting notified *before* the
> change takes place!  Docs should also clarify that when set in other code,
> _p_changed should be set at the latest possible moment, *after* the object
> is in its new, stable state.

+1

> * Keep the _p_atime slot, but don't fill it with anything by default.
> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
> at C level that's called after the getattribute completes.  A data manager
> can then set the hook to point to a _p_atime update function, *or* it can
> introduce postprocessing for "proxy" attributes.  That is, a data manager
> could set the hook to handle "lazy" loading of certain attributes which
> would otherwise be costly to retrieve, by placing a dummy value in the
> object's dictionary, and then having the post-call hook return a
> replacement value.
> 
> For speed, this will generally want to be a C function; let the base
> package include a simple hook that updates _p_atime, and another which
> checks whether the retrievedValue is an instance of a LazyValue base class,
> and if so, calls the object.  This will probably cover the basics.  A data
> manager that uses ZODB caching will use the atime function, and non-ZODB
> data managers will probably want the other hook.  I also have an idea about
> using the transaction's timestamp() plus a counter to supply a "time" value
> that minimizes system calls, but I'm not sure it would actually improve
> performance any, so I'm fine with not trying to push that into the initial
> package.  As long as the hook slot is present in the base package, I or
> anyone else are free to make up and try our own hooks to put in it.

Shouldn't there be a setattr hook too?

> * Get rid of the term "register", since objects won't "register" with the
> transaction, and neither should they with their data manager.  They should
> "inform their data manager" that they have changed.  Something like an
> objectChanged() message is appropriate in place of register().  I believe
> this would clarify the API.
> 
> * Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
> way such that it works whether you have Interface or not", but the reality
> is that a dependency in the standard library on something outside the
> standard library is a big no-no, and just begging for breakage as soon as
> there *is* an Interface package (with a new API) in the standard library.

Of course.

> Whew!  I think that about covers it, as far as what I'd like to see, and
> what I think would be needed to make it acceptable for the core.  Comments?
> 
> By the way, my rationale for not taking any radical new approaches to
> persistence, observation, or notification in this proposal is that the
> existing Persistence package is "transparent" enough, and has the benefit
> of lots of field experience.  I spent a lot of time trying to come up with
> "better" ways before writing this; mostly I found that trying to make it
> more "transparent" to the object being persisted, just pushes the
> complexity into either the app or the backend, without really helping
> anything.  It's not a really big deal to:
> 
> 1. Subclass Persistent
> 
> 2. Use PersistentList and PersistentMapping or other Persistent objects for
> your attributes, or set self._p_changed when you change a non-persistent
> mutable.
> 
> 3. Use transactions
> 
> Especially if that's all you need to do in order to have persistence to any
> number of backends, including the current ZODB and all the wonderful SQL or
> other mappings that will be creatable by everybody on this list using their
> own techniques.  The key is not so much "transparency" per se, as
> *uniformity* across backends.  I think the existing API is transparent
> enough; let's work on having uniform and universal access to it, as a
> Python core package.

I've often thought that it's ugly that you have to set _p_state and
_p_changed, rather than do these things with method calls.  What do
you think about that?  Especially the conventions for _p_state look
confusing to me.

--Guido van Rossum (home page: http://www.python.org/~guido/)