[Persistence-sig] "Straw Man" transaction API

Mon, 15 Jul 2002 11:57:56 -0400

At 03:06 PM 7/15/02 +0200, Sebastien Bigaret wrote:
>
>  - about info/setInfo(): maybe we need a setInfo() different from an
>    updateInfo() or addToInfo(). I also suspect that a 'ResourceManager'
>    writing info. to other participants might use such a metadictionary to
>    pass additional information for use in the current transaction (warning:
>    name collision); if this is *not* the place for that, it should
perhaps be
>    stated in doc.

I probably should've called it updateInfo(), since a dict.update() was the
semantics I had in mind.  I was deliberately leaving it vague as to what
information might be passed in it, since it was primarily a mechanism to
allow for extensions, not to mention support of Zope's need to save "user"
and "note" metadata on transactions.

>  - registration of Participants:
>
>      We might need a unique identifier for a given participant ; e.g., we
>      might wish that only one participant for a given 'postgresql' DB
>      connection is registered (in that case, the id. could be the DB backend
>      name+the connectionDictionary).
>
>      Obviously participants could still register without an id.

I think an identifier is a YAGNI.  I'm almost positive that my application
model won't need it.  But if you just want register() to guarantee that the
participant is registered once and only once, that's fine by me and a
sensible thing, IMHO.  Although I might just as soon it raise
ParticipantAlreadyRegistered if you register it again, as that might help
expose a bug in your code.  :)  Of course, if it does that, then I suppose
exposing an isRegistered(participant) method would allow you to work around
that.

>  - revert(): I expected an 'undo()' ; 'revert' sounds like 'abort' to me,
but
>    this can just be a language problem --the documentation made it clear.

I tried to use common RDBMS terminology; the few examples of "checkpoint"
or "savepoint" I found (e.g. Sybase) used "revert" as the terminology for
going back to the last checkpoint or savepoint.

>  - about commit(): I see this basically like a vote_commit() on each
>    participants, followed by a commit_txn()

Actually it's begin_commit() on each, vote_commit() on each, and then
commit_txn() on each.

>    I have the feeling that what will be done during the commit() phase
should
>    be explicitly stated, along with the goals we are going after. Here is a
>    little example: suppose a transaction has to commit changes against two
>    different DB storages, DB1 which supports multi-phase commit, DB2 which
>    does not.
>
>    Then they get vote_commit(): DB1 will be able to answer OK or KO, but DB2
>    will not because it is not capable of saying whether a transaction will
>    successfully succeed, hence: it answers 'OK' to the 'vote_commit'
message.
>
>    Now the participants gets the commit_txn() ; since we do not assume any
>    particular ordering for paricipants, suppose that DB1 gets it first. DB1
>    commits the changes, then DB2 attempts to commit its changes but fails:
>    what can we do? We can stop committing and start sending 'abort_txn' to
>    all participants, however, DB1 is likely to be unable to revert the
>    already committed changes --and this will definitely be the case if both
>    DB1 and DB2 do not support nested transactions).
>    
>    My opinion here is that we shouldn't try to handle multi-backends commits
>    as a whole -- some backends simply makes it almost impossible. But: this
>    should be clearly stated.

Actually, I think we should just document what will happen if you mix
voting and non-voting participants.  Also, we may wish to have some way to
declare a participant non-voting, so that such participants can receive
commit_txn() first.  ZODB Transactions can survive the failure of *one*
commit_txn() message, and StrawMan can too.  The most common use case for a
non-voting paricipant would be an RDBMS connection, and the most common use
case of such is to have only one, even if there will be other participants
writing to it.

ZODB declares itself "hosed" when a failure occurs past the first
tpc_finish() (its equivalent to commit_txn).  We will need to be similarly
cautious, if there is more than one non-voting participant.

>  - last on this: it may be useful for observers to get events such as
>    transaction_did_commit() (committing is a Transaction's message for which
>    we cannot guarantee it will come to its normal end, for the reasons
>    written above) ; I'm thinking here of some DB-caches that would be
>    participants/observers for the Transaction machinery, that would take the
>    opportunity to update their caches, etc.

That's a good point, perhaps adding a 'commit_finished()' message might do
the trick, although there are already quite a lot of messages.

>  - I have some problems about the begin/end_savepoint(): again this might be
>    a language problem, but I would prefer something like
>    'prepareToSavepoint()' and 'markSavepoint()'

Those aren't bad.

>  - same for begin_commit()

I could see prepareToCommit or prepare_for_commit, certainly.

>  - vote_for_commit: as far as I understand participants using other
>    participants can simply ignore it, but should not raise (exception to be
>    named, BTW). To my understanding, a raise here is understood as a veto.
>    Is that it?

'vote_on_commit' seems more natural to me, phrasing-wise.  Yes, a raise is
a veto; that's an assumption from ZODB transactions that I failed to document.

>Last: do we need to specify a TransactionManager or TransactionFactory API?

I don't think so, really, other than what I mentioned about providing some
simple thread-specific associations.

>Some ideas about what could be done there: (hmm, this could be made class
>method as well)
>
>  - registering participants' factories, so that Transactions can be
>    initialized with a default set of participants, since applications often
>    use the same configuration for their Transactions. Something like:
>
>      def buildDefaultTransaction(self)

YAGNI.  The code that sets up the participants should know their
transactional scope, and thus is capable of registering them with the
appropriate transaction.

>  It seems to me that the points stressed in the sig-charter are taken into
>account here --except for the 'Effective Memory Usage' which, by the way,
>cannot be addressed at the transaction level --and I do not really see how
>this particular point can be made anything else but a ``compulsory
>recommendation'' ?!

Actually, as was noted in the savepoint-related docstrings, one purpose of
the savepoint API is to indicate a "good time to write things out", which
can free up memory used by queued updates.  Also, in ZODB's persistence
model, dirty objects can't be dropped from the cache (since they contain
state that needs to be written).  So if their writes can be flushed, they
become eligible to be "ghosted" out of the cache and the memory made
available as well.  This can be an issue in large ZODB transactions,
especially those done by full-text indexing operations.

So actually the transaction API *does* have some contact points with memory
usage.  And the main reason I put savepoint() in was to accomodate this
requirement for ZODB.  I don't really expect to have much use for it in my
primary applications development.