[Persistence-sig] getting started

Phillip J. Eby pje@telecommunity.com
Wed, 10 Jul 2002 19:38:00 -0400


At 05:10 PM 7/10/02 -0400, Guido van Rossum wrote:
>> Phillip J. Eby wrote:
>[...]
>> > Anyway, the fact that these things can be done based solely on the
existing
>> > ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB"
>> > package itself, means that what's available from ZODB is actually pretty
>> > close to what's needed as a base mechanism.  It's more a question (to me,
>> > anyway) of what could/should be improved, particularly in the form of how
>> > the API calls and interfaces are phrased.
>> > 
>> > For my requirements, I'd be fine with it if we just put Persistent and
>> > Transaction in the standard library, with better docs.  :)  But it'd be
>> > nice if certain things were spelled differently, or a bit more flexible.
>
>This is a goal I can agree with.  Care to start a list of what
>spellings you'd like to change?
>

As I said, I can pretty much live with it all as it is now.  Some minor
annoyances:

* There's no way to be notified that a "transaction is over".  You have to
trap different messages from Transaction, while perhaps registering a dummy
object, just to figure out transaction boundaries.  This is a pain when
creating transactional caches, i.e., ones which want to clear themselves
whenever a transaction commits *or* aborts.

* A similar, related pain, is that you have to re-register on *every*
transaction, and keep track of whether you've registered yet, any time you
do something that might mean you *should* be registered.  A way to
"permanently" (i.e. until app termination or otherwise requested) subscribe
to transaction begin/end messages would be very handy.  Or even to the
whole tpc_begin/vote/finish message sequence.

* While on the subject of such messages, why should the Transaction object
have to be the one to keep track of changed objects?  Why shouldn't data
managers do that themselves?  In the case of my "storage jars" model, I
have to track "currently dirty objects" separately from the transaction's
list of objects needing to be committed, because I may "pre-flush" certain
changes to say, an RDBMS, in order to ensure that queries within the same
transaction will see the updated data.  Since the "jar" has to track this
anyway, why does the Transaction need to do the same?  Why not just send
the jars a set of begin/vote/finish messages?

In my current framework, my "jars" automatically detect when they're being
asked to commit something that's already flushed to the back-end, and
ignore it.  If the Transaction didn't bother tracking stuff and telling me
to commit it, I'd just have tpc_begin cause a flush of all dirty objects,
and I'd be ready for tpc_vote.  Not only that, but the Transaction object
itself would get lots simpler, and wouldn't need to have complex logic to
manage data managers' objects for them!  (Granted, data managers would need
to know which items they "committed" during tpc_begin->tpc_vote, in order
to roll them back, but I suspect that many data managers are already
tracking this in some form, if only to do invalidation messages.)


* "Ghosting" attributes.  Right now, persistent objects are either loaded,
or not.  There's no way to designate an object as "loaded except for
attributes X, Y, and Z".  Why do I need that?  Because I may have data
stored for that object in different back-ends (LDAP and SQL is a combo that
comes up often for me) and don't want to incur a possibly large load-time
penalty to get all the (non-object) attributes, that may not even get read
during a particular transaction.  So, if we're talking about redoing
Persistence.Persistent, I'd like to see attribute-specific read/write
monitoring, if it doesn't add so much performance overhead as to remove the
benefits of having it.

By the way, it would be an acceptable solution for this if we had extremely
lightweight proxies that could stand-in for an arbitrary Python object, and
call something to load the "real" object upon access.  Of course, if we had
such an animal, it could replace the need for subclassing
Persistence.Persistent in the first place!  It could also trap all the
"modifying" methods like __setitem__, __setslice__, etc.

(Interestingly, the Zope 3 security proxy objects written in C, look to me
to have sufficient generality to perform these functions, in that they
monitor all attribute and method accesses.  Although I am perhaps missing
whether they work in regard to operations that the object performs upon
*itself*.  It may be that such accesses are not checked, but would need to
be for a persistence proxy.)

Anyway, the above pretty much sums up my principal annoyances/peeves with
Persistent and Transaction.  I can pretty much do everything I want with
the existing systems, but the above things would make them easier to do.
(Right now, to do state that's loaded from multiple back-ends, I have to
have some kind of support added into the object, or change its class on the
fly to add descriptors for lazily-loaded attributes.)