[DB-SIG] Emulating transaction management using sagas
nigel-c@clear.net.nz
nigel-c@clear.net.nz
Fri, 3 Jan 2003 23:00:00 +1300 (NZDT)
s posted to db-sig as persistence-sig appears to be fairly dead,
and also to SAPDB-general]
I've been doing a bit of frigging around and have developed a prototype
application server based off the server frameworks in the python libraries.
It uses ThreadingMixin, which gives it a thread-per-request model. The
basic featue set is:
o Configuration file using ConfigParser and overrides off the CL
it's invoked from
o A simple thread safe DB connection pool (more on this later) that
uses SAPDB. It would be fairly simple to generalise it but I haven't
yet.
o A more elaborate framework to register components in a hierarchical
API (it uses the XML-RPC libraries)
o Expanded logging functionality to support error, call and debug
logging
Now comes the fun part: adding transaction management. Note that I
haven't done this yet - I'm just pontificating on a couple of
approaches.
Transaction management (where the middle-tier can control the commit
or rollback of transactions) is useful in two cases:
o When the transaction should participate in a global distributed
transaction that has other database managers (or even other TP
monitors or app servers) participating in it (i.e. all the databases
either commit or roll back together and this consistency is
guaranteed by the system software), or
o When a transaction should span more than one call to the middle-tier
Examples of this are complex financial data entry (covering more
than one screen) when you don't want to allow inconsistent records
to be committed, and workflow systems where more than one user might
participate in the same transaction.
Remember the connection pool? In a client-server application, each
client has its own connection, which has an implicit transaction context.
However, the transaction context is tied to the connection. This isn't
much use on pooled connections, as something else could have used the
connection in between the calls to the middle tier - and the call might
(in fact will probably not) get the same database connection that the
first one used.
Connection pooling (which is the typical approach in a 3-tier application)
bollixes this up. Subsequent calls that are a part of the transaction
may not use the same connection, and there may have been another transaction
using it in the meantime. To do any sort of transaction management in
this setting, you need to be able to pick up and put down transaction
contexts.
In an XA-compatible DBMS, the transaction context has its own ID, which
is seperate from the connection. Thus any connection can be used with
any transaction context. None of the OSS database management systems
appear to be able to do this. If anyone knows something to the contrary,
I'd be interested to hear.
For OSS databases, we now get to find a way to fake it. I can think of
two strategies, each of which have pros and cons.
i. Treat a database connection as a transaction context. On a server
box like a Compaq ML370, one can put 4GB of RAM onto the machine
fairly cheaply these days. This would reasonably allow us to open
(at a guess) something like 5,000 connections. If anyone knows
how much memory Interbase, SAPDB or Postgres use per connection,
I'd be interested to hear.
We could therefore have 5k or so transactions open at any given
time, which is a definite limit on scalability. It's a bit of
a hack, but does let us use arbitrary SQL in the context of the
transaction and doesn't require any i-dotting and t-crossing from
the application. It's also fairly simple to implement.
Another minus is that the context is tied to a single middle-tier
node, so clustering is out. Transactions would have at-most-once
semantics. If the middle-tier or database died, uncommitted
transactions be would be rolled back.
This approach would work OK for applications that use predominantly
single-call transactions but have a small number of conversational
transactions.
ii. Use a technique called 'sagas'. This is essentially a command
pattern. Original versions of all of the records affected are
stored for the duration of the transactions; if the transaction
is rolled back, the old records are written back to the database
in a compensating transaction. Note that this has an implied
read-uncommitted isolation level for data committed in the database
but not actually yet 'committed' in the transaction manager.
This requires some co-operation from the application, but could
be substantially be automated if all transactions went through
a persistence layer. Berkely DB on the application server might
be a good persistence mechanism for the rollback data for in-
progress transactions.
This solution is more complex, and perhaps best implemented as part
of a persistence mechanism to abstract away the boilerplate code
that it would require. It can emulate nested and chained transactions.
For a more detailed treatment of Sagas, see the Gray and Reuter
transaction processing book.
I'm interested to hear comments.
Nigel.