[Persistence-sig] "Straw Baby" Persistence API
Jim Fulton
jim@zope.com
Mon, 22 Jul 2002 13:16:46 -0400
Phillip J. Eby wrote:
> Following on the unparalleled success of the "Straw Man" transaction API
> (he said, with tongue in cheek),
It seemed pretty sucessful to me.
> I thought it might be good to make a
> proposal for persistence as well.
Thanks. This is very helpful.
> Since I won't be at the BOF,
We'll miss you.
> I figure I
> should get my two cents in now, while the getting's good.
>
> Here's my proposal, such as it is... Deliver a Persistence package based on
> the one at http://cvs.zope.org/Zope3/lib/python/Persistence/ but with the
> following changes:
>
> * Remove the BTrees subpackage, and the Class, Cache, Function, and Module
> modules, along with the ICache interface. Rationale: The BTrees package is
> only useful for a relatively small subset of possible persistence backends,
> and is subject to periodic data structure changes which affect applications
> using it.
I'm OK with taking out BTrees, however, BTrees were included in ZODB by
very popular demand.
You haven't given a rational for not including the caching framework.
The caching framework is closely ties to persistence and, I think,
largely independent of data managers.
> It's probably best kept out of the Python core. Similar
> arguments apply to the Cache system, although not quite as strongly.
> Class, Function, and Module are very recent developments which have not had
> the extended usage that most of the rest of the code has.
Fair enough.
> (Note: I don't
> mean to say that the persistence C code has been thoroughly exercised
> either, in the sense that much of it is completely new for Python 2.2. But
> its *design* has a long history, and previous implementations have had much
> testing of the kind of edge and corner issues that the Class, Function, and
> Module modules haven't been exposed to yet.)
>
> * I do think we should keep PersistentList and PersistentMapping in the
> core; they're useful for almost any kind of application, and any kind of
> back-end storage. They don't introduce policy or data format dependencies
> into users' code, either.
I *never* use persistent list and almost never use persistent mapping.
I find BTrees far more useful. :)
> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar. This could be
> done by making a _p_jar descriptor that read/wrote through to _p_dm, and
> issued a deprecation warning. I don't personally have a problem with
> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's
> confusing or that they want to get rid of it. So if we're doing it, now
> seems like the time.
I wouldn't worry about backward compatability. Ditch '_p_jar' and pick
a better name, like '_p_manager' as you suggested.
> * Flag _p_changed *after* __setattr__, not before! This will help
> co-operative transaction participants play nicely together, since they
> can't "write through" a change if they're getting notified *before* the
> change takes place!
It would be helpful if you could provide an illustrative example in a separate
dedicated message.
> Docs should also clarify that when set in other code,
> _p_changed should be set at the latest possible moment, *after* the object
> is in its new, stable state.
I'm with Guido in wanting a set of api calls to replace the baroque
'_p_changed' semantics.
Note to both you and Guido, you (Phillip) are right, _p_state is an internal
implementation detail.
> * Keep the _p_atime slot, but don't fill it with anything by default.
> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
> at C level that's called after the getattribute completes. A data manager
> can then set the hook to point to a _p_atime update function, *or* it can
> introduce postprocessing for "proxy" attributes. That is, a data manager
> could set the hook to handle "lazy" loading of certain attributes which
> would otherwise be costly to retrieve, by placing a dummy value in the
> object's dictionary, and then having the post-call hook return a
> replacement value.
I suggest we step back a bit and think of the API in terms of events.
I suggest we think about what events are generated and who they are
sent to. Your API change is consistent with that,
> For speed, this will generally want to be a C function; let the base
> package include a simple hook that updates _p_atime, and another which
> checks whether the retrievedValue is an instance of a LazyValue base class,
> and if so, calls the object. This will probably cover the basics. A data
> manager that uses ZODB caching will use the atime function, and non-ZODB
> data managers will probably want the other hook. I also have an idea about
> using the transaction's timestamp() plus a counter to supply a "time" value
> that minimizes system calls, but I'm not sure it would actually improve
> performance any, so I'm fine with not trying to push that into the initial
> package. As long as the hook slot is present in the base package, I or
> anyone else are free to make up and try our own hooks to put in it.
I'd like to get rid of _p_atime, as it is totally dependent on a particular
cache implementation, which we happen to be phasing out.
Persistent objects should have *no*
> * Get rid of the term "register", since objects won't "register" with the
> transaction, and neither should they with their data manager. They should
> "inform their data manager" that they have changed. Something like an
> objectChanged() message is appropriate in place of register(). I believe
> this would clarify the API.
That's fine.
> * Take out the interfaces. :( I'd rather this were, "leave this in, in a
> way such that it works whether you have Interface or not", but the reality
> is that a dependency in the standard library on something outside the
> standard library is a big no-no, and just begging for breakage as soon as
> there *is* an Interface package (with a new API) in the standard library.
I think that this is a very bad idea. I think the interfaces clarify things
quite a bit.
> Whew! I think that about covers it, as far as what I'd like to see, and
> what I think would be needed to make it acceptable for the core. Comments?
>
> By the way, my rationale for not taking any radical new approaches to
> persistence, observation, or notification in this proposal is that the
> existing Persistence package is "transparent" enough, and has the benefit
> of lots of field experience. I spent a lot of time trying to come up with
> "better" ways before writing this; mostly I found that trying to make it
> more "transparent" to the object being persisted, just pushes the
> complexity into either the app or the backend, without really helping
> anything. It's not a really big deal to:
>
> 1. Subclass Persistent
>
> 2. Use PersistentList and PersistentMapping or other Persistent objects for
> your attributes, or set self._p_changed when you change a non-persistent
> mutable.
These are not a big deal to you, because you have a deep understanding and
interest in the machinery. They are a big deal to most people. It would
be *wonderful* if we could avoid this. Maybe if we had a standard persistence
framework, we could motivate language changes that made this cleaner. :)
> 3. Use transactions
>
> Especially if that's all you need to do in order to have persistence to any
> number of backends, including the current ZODB and all the wonderful SQL or
> other mappings that will be creatable by everybody on this list using their
> own techniques. The key is not so much "transparency" per se, as
> *uniformity* across backends. I think the existing API is transparent
> enough; let's work on having uniform and universal access to it, as a
> Python core package.
Transactions are a huge benefit, as opposed to something that is "not
really a big deal". :)
Here are some additional points:
- While we should provide a standard implementation of a persistence
*interface*, we should allow other implementations. For example, the
data manager or cache should not depend on internal details of the
persistence implementation. We should not require a specific C layout
for persistent objects, for example.
- The persistence interface and implementations should be independent of
the cache implementations (e.g. no _p_atime). We *do* need to provide
an better API for handling objects that are unwilling to be deactivated.
Perhaps _p_deactivate should return a value indicating whether the object
was deactivated, and, if not, perhaps why.
- We need to define the state model for persistent objects. I'd like to include
the notion of a persistent refcount. Possible states are:
o Unsaved
o Up to date
o changed
o ghost
In addition, there is a persistent reference count. This is used by C code
to indicate that the object is being used outside of Python. An objecty
can't be turned into a ghost if it's persistent reference count is > 0.
We'll model the reference count as a "sticky" state. We transition to the sticky
state when the reference count becomes non-zero and from the sticky state
when the reference count drops to zero. This state is largely indepent of the other
states.
- I'd like to spend some time thinking through persistence related events.
Here's a start:
o When a persistent object is modified while in the up-to-date state,
it should notify it's datata manager and transition to the changed state.
o When the object it accessed, it should notify it's data manager. Perhaps it
should pass it's current state.
o The persistent object calls a method on the data manager when it's state
needs to be loaded.
o The persistent object should probably notify the data manager of any state
changes.
Jim
--
Jim Fulton mailto:jim@zope.com Python Powered!
CTO (888) 344-4332 http://www.python.org
Zope Corporation http://www.zope.com http://www.zope.org