[DB-SIG] Two-phase commit API proposal (was Re: Any standard for two phase commit APIs?)

James Henstridge james at jamesh.id.au
Thu Jan 24 09:50:32 CET 2008


On 24/01/2008, Stuart Bishop <stuart at stuartbishop.net> wrote:
> James Henstridge wrote:
>
> > Proposal 1:
> > * Plain string IDs should work fine as transaction identifiers for
> >   applications built from scratch with that assumption: they would
> >   need to identify the global and branch parts in their own way.
> >
> > * A plain string can be stuffed inside an XA style transaction
> >   identifier, even if it isn't making use of all the different
> >   components.
> >
> > * Therefore, all methods accepting transaction IDs should accept
> >   strings.
> >
> > * As some transaction IDs in the database might not match this simple
> >   form, there are two options for the recover() method:
> >     1. return a special object that represents the transaction, which
> >        will be accepted by commit()/rollback().  How string-like must
> >        these objects be?
> >     2. omit such transaction IDs from the result.
> >
> > * For databases that support more structured transaction IDs (such as
> >   those used by XA), the 2PC methods may accept objects other than
> >   strings.
> >
> > Proposal 2:
> >
> > * Many databases follow the XA specification, so it makes sense to use
> >   transaction identifiers structured in the same way.
> >
> > * For databases that do not use XA-style transaction IDs, it is
> >   usually possible to serialise such an ID into a form that it can
> >   work with.
> >
> > * Therefore, all methods accepting transaction IDs should accept
> >   3-sequences of the form (formatID, gtrid, bqual).
> >
> > * For databases using non-XA transaction IDs, it is possible that some
> >   transaction IDs might exist that do not match the serialised form.
> >   The recover() method has two options:
> >     1. return a special object representing the ID that will be
> >        accepted by commit()/rollback().  Such an object should act
> >        like a 3-sequence.
> >     2. omit such transaction IDs from the result.
> >
> > * For databases not using XA-style transactions, the 2PC methods may
> >   accept objects other than 3-sequences as transaction IDs.
> >
> >
> > Both of these proposals seem to get rid of the main points of contention:
> > * removes the xid() constructor from the spec.
> > * allow use of simple objects (strings or tuples) as transaction IDs
> > * provides an obvious way to expose database-specific transaction IDs.
>
> I wouldn't call any of these a point of contention. They where points of
> discussion. Attempting to remove the xid() constructor from the spec is
> premature when people where just considering if tuples can be used instead.
>
> I don't think omitting transaction ids from tpc_recover() is acceptable.
> Doing so means you can't write a transaction manager that plays nicely in a
> more complex environment where components may not be under our direct
> control, let alone written in Python and using ths API. My use case here is
> a reaper script that detects and handles or reports lost transactions.
>
> Here is an edge case with proposal 1. Here, con happens to be a connection
> to a MySQL database. Which Xid represents the prepared transaction?
>
> >>> con.tpc_begin('foo')
> >>> con.tpc_prepare()
> >>> con.tpc_recover()
> [<Xid 'foo', '', 1>, <Xid 'foo', '', 0>, <Xid 'foo', 'None', 1>]

If we were going with proposal 1 (defaulting to strings as transaction
IDs), it would be the one that compares equal to "foo".  The exact
answer would depend on how the database adapter was implemented.


> You could try fixing this by returning a heterogeneous  list, but I think
> this is just making the hole deeper:
>
> >>> con.tpc_begin('foo')
> >>> con.tpc_prepare()
> >>> con.tpc_recover()
> ['foo', <Xid 'foo', '', 0>, <Xid 'foo', 'None', 1>]

In this case, the answer is still "the one that compares equal to 'foo'".


> Proposal 2 seems the better option. I think we need to specify that the
> 3-tuple cannot contain None values.

I suppose working with transaction IDs that couldn't be deserialised
might be easier with proposal 2.  For example, it could provide the
raw ID in one part and leave the other two None.

For proposal 2, I think we should stick to XA-compatible IDs.  That
is, formatID a number >= 0, and the global ID and branch qualifier as
strings no longer than 64 characters each.


> I personally feel that an Xid() constructor makes things more readable. It
> also means we can have driver specific defaults for the format id rather
> than no default.
>
> tpc_begin(Xid('foo', 'bar', 1))         vs.     tpc_begin(('foo', 'bar', 1))
> tpc_begin(Xid('foo', 'bar'))            vs.     tpc_begin(('foo', 'bar', 1))
> tpc_begin(Xid('foo'))                   vs.     tpc_begin(('foo', '', 1))

I don't know if adapter-specific defaults make sense.  Perhaps pick
the defaults from MySQL?

"""
As indicated by the syntax, bqual and formatID are optional. The
default bqual value is '' if not given. The default formatID value is
1 if not given.
"""

If we do have a transaction ID constructor, I think it should be a
method on the connection.  You can make use of pretty much the entire
DB-API using just a connection as an entry point (especially if the
exceptions are provided as connection attributes).  It seems sensible
to do the same here.

James.


More information about the DB-SIG mailing list