[DB-SIG] Two-phase commit API proposal (was Re: Any standard for two phase commit APIs?)

Wed Jan 23 14:11:53 CET 2008

M.-A. Lemburg wrote:
> On 2008-01-23 02:18, James Henstridge wrote:
>>> [XID format used in XA]
>>> So, essentially, only the global transaction id and the branch id
>>> are relevant and both are represented in the data string.
>> One interesting part of that is the "If OSI CCR naming is used, then
>> the XID's formatID element should be set to 0; if some other format is
>> used, then the formatID element should be greater than 0."
>>
>> I took a quick look at a few J2EE servers (which use XA), to see what
>> they do for transaction managers.  Neither JBoss or Geronimo seem to
>> use formatID=0, but instead use magic numbers that I presume are
>> intended to determine if they created the transaction ID.
>>
>> That said, the selection of format identifiers seems a bit ad-hoc:
>> Geronimo uses 0x4765526f, which has a byte representation of "GeRo".
>>
>> It seems that you could do pretty much the same thing by getting TMs
>> to check the global ID itself ...
> 
> So we do need to store the "formatID" as well ?

It looks like yes we do. MySQL's syntax for xids allows an optional formatid
and this is returned by XA RECOVER. In MySQL, it is a number rather than a
string. Assuming that any system that uses more than a simple string for the
xid is doing so to map onto the XA specification, we could safely represent
xids as a 3-tuple of (unicode, unicode, integer).

How to deal with None's and empty strings needs to be thought out though to
avoid round trip edge cases:

>>> con = connect('')
>>> xid = ('g', '', None)
>>> con.tpc_begin(xid)
>>> con.tpc_prepare()
>>> con.tpc_recover()
[('g', None, 1)]
>>> con.tpc_recover()[0] == xid
False

'' and None for the gtid and brid would be equivalent, and 1 and None would
be equivalent for the format_id (1 is the default format id in MySQL). To
avoid round trip issues with tuples, only one of these values should be allowed.

If we use an object, these issues go away:

>>> con = connect('')
>>> xid = Xid('g', '')
>>> tuple(xid)
('g', None, 1)
>>> con.tpc_begin(xid)
>>> con.tpc_prepare()
>>> con.tpc_recover()
[<Xid 'g', None, 1>]
>>> con.tpc_recover()[0] == xid
True

> Given that the formatID is used for some purpose as well (probably
> just as identification of the TM itself), I guess we'd have
> to use a 3-tuple (format id, global transaction id, branch id).
> 
> Modules should only expect to find an object that behaves like
> a 3-sequence, they should accept whatever object is passed to
> them and return it for the recover method.
> 
> This leaves the door open for extensions used by the TM for XID
> objects.

I don't see a technical problem with the tuple apart from the round tripping
issue above and someone might have a nice solution to that. Subjectively, I
think an object reads better though, particularly as in many cases you will
only want to bother specifying one or maybe two of the three parts.
Xid('foo') vs. ('foo', None, None).

Is CamelCase of xid 'Xid' or 'XID' or 'XId' ?

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: OpenPGP digital signature
Url : http://mail.python.org/pipermail/db-sig/attachments/20080123/85777b53/attachment.pgp