[Persistence-sig] Re: ACID, savepoints, and exceptions (was re: "Straw Man"transaction API)

Wed, 21 Aug 2002 19:02:48 -0400

At 12:32 AM 08/20/2002 -0400, Jeremy Hylton wrote:
> >>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
>   PJE> In my API I've standardized on a 'CannotRevertException' when
>   PJE> rollback to a savepoint is not possible, and added a
>   PJE> 'NullSavepoint' object which can be returned by an object that
>   PJE> has nothing to do on rollback.
>
>NullSavepoint is just an implementation convenience, right?

Yep.

>   PJE> An open issue that needs to be addressed, however, is the
>   PJE> question of rolling back more than once to the same savepoint.
>   PJE> In some ways, it's a very handy capability, but I'm not sure
>   PJE> which databases support this.
>
>Let me ask the question the other way:  Of the databases that support
>savepoints, which ones don't support this?

An interesting question.  The reason I'm iffy about it is that the ones I 
looked at (Sybase, Oracle, and SleepyCat/BerkeleyDB) weren't very precise 
in their docs, at least the docs I looked at.  They simply didn't mention 
what happens to a savepoint once you roll back to it.  SleepyCat offers 
nested transactions, which I *believe* are terminated upon rollback, just 
like top-level transactions.  So anything implemented on a SleepyCat 
back-end might need to work around this issue.

>   PJE>                                I'm therefore inclined to say we
>   PJE> should explicitly say that a savepoint can be rolled back at
>   PJE> most once (since some savepoints may not be able to be rolled
>   PJE> back).
>
>I want savepoints that can be returned to multiple times.  If a
>database supports savepoints at all, I don't see why it wouldn't
>support multiple rollbacks.  (If it didn't, an adapter could
>just call savepoint() as part of finishing each rollback().)  Multiple
>rollbacks is necessary to support nested transactions.

I don't think that rollback to the *same* savepoint is necessary, but I 
suppose the point is moot, since even a DB that didn't allow multiple 
rollbacks would logically support creating a second savepoint at the 
location you got to after rolling back the first.  It's a little more work 
to implement in that case, but I think I agree with your logic.

But...  there is a difference in implementation burden that applies 
here.  How many applications will use savepoints as part of their natural 
flow, and is it too much to ask to have them do:

while 1:
     sp = txn.savepoint()

     try:
         # do something that might fail...
     except:
         sp.rollback()
         continue

The only difference here, as far as I can see, is that the savepoint() call 
is in the loop (in my suggested approach) instead of just above and outside 
it (as it would be with reusable savepoints).

Perhaps there's something else you're using savepoints for that doesn't 
look like this sort of loop, in which case it would be interesting to learn 
about that use case.

>   PJE>                                      This suggests adding a
>   PJE> 'canRollback()' method to the interface, such that a rollback
>   PJE> aggregator can check that its aggregated savepoints can
>   PJE> actually be rolled back, so that "CannotRevert" errors don't
>   PJE> cause the transaction to be hosed.
>
>It's probably good to have some way to query this, although I feel
>like the predicate methods for testing features haven't worked out all
>that well in the ZODB3 storage api.  What about that client code has
>access to would support the canRollback() method?  It seems like it
>depends on which objects are participating in the transaction.
>
>I tend more towards an ask for forgiveness (AFF) than a look before
>you leap (LBYL).  If savepoint() returned None when it wasn't possible
>to rollback, that would be good enough, no?  The clients know, for
>their specific transaction, whether rollback is going to work.  The
>savepoint() presumably hasn't caused too much extra work in those
>cases.

Okay.  So what you're saying is, document that savepoint() returns an 
IRollback or None, and None means you can't roll back to the 
savepoint.  And if any participant returns None for the savepoint() call, 
the transaction must return None from its savepoint() call.  I'm good with 
that; my primary goal here is just to remove the ambiguity of what happens 
when something can savepoint() but not rollback().

>   PJE> Another issue here is clean aborts.  If an error is raised by a
>   PJE> data manager during abort, what should the semantics be?  Older
>   PJE> ZODB transaction classes wrap every data manager abort call in
>   PJE> a try-except that ensures that *all* the abort methods get
>   PJE> called, even if several of them raise errors.  The new ZODB4
>   PJE> transaction API doesn't do this, and thus can fail to
>   PJE> completely roll back a transaction.
>
>I tried to do as little as possible within the commit() implementation
>to deal with errors.  I figured if an error occurs, the client had
>better abort the transaction explicitly.  The documentation for ZODB3
>said that clients needed to do this, but the implementation didn't
>work that way.

Er, the paragraph I wrote above is about the abort() method; the word 
"commit" isn't even in the the paragraph. :)  I'm fine with the idea of 
requiring an explicit abort() by the application upon exception during 
commit().  It's the fact that ZODB4 doesn't trap errors during *abort()* 
that's an issue for me, relative to older ZODB versions.

When I get back from the Enterprise Architecture summit, I plan to redo 
some things in my own "straw man" transactions for PEAK.  I realized on the 
trip up here, that I haven't really thought through some of the 
ramifications of Shane's "multi-pass commit" counter-proposal to my 
"write-through cascade" architecture.  For example, durable subscriptions 
make less sense in the multi-pass commit model, because there are more 
objects to call, more times, up to O(n^2) in the degenerate case, for 
fairly large "n" (I expect to have dozens of data managers per app, 
although relatively few will have active involvement in a given 
transaction).  I also need to think through how the re-pass protocol will 
work, given the absence of durable subscriptions.

I have some hope that these re-thinks will make the API leaner and meaner 
than I currently have it, while retaining "Zopeward 
compatibility".  Ideally, we should be able to each present our somewhat 
different transaction models to the SIG, as a jumping-off point for future 
discussion.

I have lowered my expectations somewhat, however, with respect to the SIG's 
goal of a transaction API.  Previously I hoped to use the to-be-decided API 
as PEAK's core transaction API, but now I'm aspiring merely to have in PEAK 
an API that can be adapted to that of the SIG. Or, if I turn out to be 
really lucky, the PEAK API may merely end up being a slight superset 
relative to the SIG API.  Unfortunately, I have too much code in too many 
projects which need the PEAK transaction API to exist already, and so I 
need to move forward with *something*, even if I end up having to do some 
refactoring later.

Luckily, however, my first draft at an actual PEAK implementation, both of 
a standalone transaction service and as a transaction service layered over 
the ZODB4 transaction API, verified for me that it's possible to do this 
kind of layering, as long as the underlying transaction API is at least as 
rich as that of ZODB4.  And I'm guessing the SIG isn't going to endorse any 
transaction model that isn't at least that rich.  :)