[Persistence-sig]
ACID, savepoints, and exceptions (was re: "Straw Man"
transaction API)
Phillip J. Eby
pje@telecommunity.com
Sun, 18 Aug 2002 11:46:53 -0400
At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote:
>Last week, I worked out a revised transaction API for user code and
>for data managers. It's implemented in ZODB4, but is fairly
>preliminary code. I imagine we'll revise it further, but I'd like to
>describe the changes briefly.
During the past week, I've been writing a TransactionService for PEAK,
specifically designing it to allow interaction/adaptation to the new ZODB4
transaction API, and extending it to support the multi-prepare and durable
subscription models that I need for my applications and framework
projects. I believe I've largely been successful, but in the process it
has highlighted for me some open issues/ambiguities in the ZODB4
transaction API as it sits right now, relating to error handling and also
savepoints.
>class IRollback(Interface):
>
> def rollback():
> """Rollback changes since savepoint."""
>
>I think the rollback mechanism will work well enough. Gray and Reuter
>explain that it can be used to simulate a nested transaction
>architecture. Thus, I think it's a reasonable building block for the
>nested transaction API.
In my API I've standardized on a 'CannotRevertException' when rollback to a
savepoint is not possible, and added a 'NullSavepoint' object which can be
returned by an object that has nothing to do on rollback.
An open issue that needs to be addressed, however, is the question of
rolling back more than once to the same savepoint. In some ways, it's a
very handy capability, but I'm not sure which databases support this. I'm
therefore inclined to say we should explicitly say that a savepoint can be
rolled back at most once (since some savepoints may not be able to be
rolled back).
Another open issue: what happens if a rollback fails? Is the transaction
"hosed" at that point? What if five data managers roll back, and the sixth
one fails? This suggests adding a 'canRollback()' method to the interface,
such that a rollback aggregator can check that its aggregated savepoints
can actually be rolled back, so that "CannotRevert" errors don't cause the
transaction to be hosed. However, the issue of another type of exception
occurring during rollback still must be addressed.
>I think I'm also in favor of the new abort semantics. ZODB3 would
>abort the transactions -- call abort() on all the data managers -- if
>an error occurred during a commit. The new code requires that the
>user do this instead. I think that's better, because it leaves the
>state of the objects intact if the code wants to analyze what went
>wrong before retrying the transaction.
The interesting question here again is, is the transaction "hosed"? Should
there be a flag that says, "you can't do anything to this transaction but
abort it"?
To put it in broader terms, if *any* exception is thrown during execution
of a transaction-related method, should we consider the transaction
unrecoverable?
I'm inclined to say yes, because I can think of too many code paths in both
my and the ZODB4 transaction code where it becomes nearly impossible to
guarantee a "clean" state when an exception occurs. By definition, if code
called by the transaction system raises an exception, it is announcing that
it cannot satisfy its contract with the transaction. Therefore, the
transaction cannot be certain of satisfying its contract with the
application for a clean commit.
Another issue here is clean aborts. If an error is raised by a data
manager during abort, what should the semantics be? Older ZODB transaction
classes wrap every data manager abort call in a try-except that ensures
that *all* the abort methods get called, even if several of them raise
errors. The new ZODB4 transaction API doesn't do this, and thus can fail
to completely roll back a transaction.
Of course, the tradeoff is that the old code only gave you information
about the first exception that occurred, and not any of the later
ones. Perhaps the answer is to make the transaction keep track of which
data managers have received which messages, and to require the caller to
keep 'abort()'-ing until all data managers have been aborted, even if each
one raises errors?
I don't really know what's "right" here. If the first data manager's
failure causes subsequent DM's to fail, what then? How much retry and
recovery logic code must somebody put into their application, in order to
guarantee correctness and recovery? Isn't that what the transaction API is
*for*?
I guess my inclination at this point is to think that maybe the transaction
needs to have some kind of log - not in the 'logging' module sense, but in
the sense of a list of actions performed and errors occurred. These errors
could then be wrapped up in another exception or a return value upon
completion of operations like abort() and commit(). Then, if somebody
wants to analyze it, they have all the data.
But I don't believe it makes sense for the application to try to correct
errors "under the hood" of the transaction. Data managers should handle
their own errors, if there's any handling to be done. Any analysis of the
errors after the fact is going to be by a human being, to figure out how to
fix the application or the data managers so they don't do whatever it is in
the first place, or so that they catch the problem before it becomes an
error in a commit or abort operation.