[Persistence-sig] A simple Observation API

Mon, 29 Jul 2002 20:23:29 -0400

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  [GvR asks the question that puzzles me too:]
  >> What exactly is the point of collapsing multiple setattr() ops
  >> together?  Just performance?  Or is there a semantic reason?  If
  >> just performance, where is the time going that you're trying to
  >> save?

  PJE> Semantics plus performance.  The semantic part is that some
  PJE> "database" systems (e.g. LDAP) inherently don't support
  PJE> transactions, AND must receive a semantically valid set of
  PJE> attributes in a single update operation.  I may be
  PJE> overgeneralizing this aspect, however.

  PJE> The performance save is for situations like Tim Peters'
  PJE> distributed cache example.  If a change notification is going
  PJE> to cause network traffic, it would be a good idea to minimize
  PJE> the number of such notifications.  It's a common situation
  PJE> (IMHO) to change multiple attributes in a set of related
  PJE> methods, so this supports that scenario while ensuring a
  PJE> minimal set of update events are issued.

I remain convinced that the current mechanism ought to work.  Perhaps
I just needed to be convinced otherwise, but I don't think these cases
are worked out in enough detail to be convincing.

I also think the semantics of the proposed alternative makes it harder
on the users, presumably in order to make the infrastructure's job
easier.  I'm thinking about a complex data structure implemented using
many helper methods.  If the data structure is modified inside a
helper message, it can't mark the object changed; it needs to wait for
the top-level operation to finish.  As a result, the data structure
would need to keep a separate flag to indicate whether it should be
marked as changed later.  Then the methods that are "top-level" needed
to be edited to check that flag and set _p_changed.  It's worse,
though, because you might want to implement one "top-level" operation
by calling another top-level operation.  That would require the
introduction of extra wrappers around the public versions of methods
that just do bookkeeping, so that the internal routines could call
other internal routines.

The complexity aside, I don't understand why the transaction framework
isn't sufficient to handle the two examples you mention above.  LDAP
does not support transactions, but does expect to get consistent
updates.  A transaction provides, among other things, the
consistency.  It should be possible to delay updates to the LDAP
database until the transaction commits.  The fact that LDAP does not
participate in two-phase commit limits its robustness, but should not
affect consistency.  (Specifically, I mean that a transaction may fail
in the final stage of the two-phase commit with this sort of data
manager.)

The distributed cache examples seems to be the same.  If there are
multiple udpates, delay sending any of the updates until the
transaction commits.  It might abort, after all, and then no updates
need to sent; this is just the atomic property of transactions.

The two examples seem to need the A and C of ACID transactions, so why
not use them?

Proper nested transactions should make the current mechanism even
cleaner.  Some methods of an object may want to have ACID semantics.
They can operate as a subtransaction, with all-or-nothing updates to
the object state provided that the top-level transaction commits.

I think a simple boolean flag, _p_changed, is all the change
notification we need when combined with transactions.

Jeremy