From jacobs@penguin.theopalgroup.com  Mon Jul  8 21:09:25 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Mon, 8 Jul 2002 16:09:25 -0400 (EDT)
Subject: [Persistence-sig] Is anyone here yet?
Message-ID: <Pine.LNX.4.44.0207081608530.31995-100000@penguin.theopalgroup.com>

Is anyone here yet?

For a moment, I am king. ;)

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From marklists@mceahern.com  Mon Jul  8 21:20:43 2002
From: marklists@mceahern.com (Mark McEahern)
Date: Mon, 8 Jul 2002 15:20:43 -0500
Subject: [Persistence-sig] Is anyone here yet?
In-Reply-To: <Pine.LNX.4.44.0207081608530.31995-100000@penguin.theopalgroup.com>
Message-ID: <JHEOKEOOLIGLDHCMAHMOAEGFDAAA.marklists@mceahern.com>

> Is anyone here yet?

it's a small world so far:

  http://mail.python.org/mailman-21/roster/persistence-sig

// m
-



From jeremy@zope.com  Tue Jul  9 14:15:32 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Tue, 9 Jul 2002 09:15:32 -0400
Subject: [Persistence-sig] getting started
Message-ID: <15658.57844.17239.668311@slothrop.zope.com>

It looks like many of the people who expressed interest in the SIG
have subscribed to the list, so we ought to get started.  I think we
should begin with some introductions and a review of the SIG charter.

Introductions: Please tell us about your interest in the persistence
SIG, what personal/professional goals you have for it, and how much
time & energy you have.  (Feel free to lurk if that's your preference.)

Charter: Jim Fulton wrote the SIG charter.  A very brief summary is
that we should:

  - focus on transparency, transactions, and memory-caching issues;

  - put off concurrency control, queries, and constraints;

  - produce PEPs and, if there is consensus, code for the std library.

Does that sound like the right set of initial constraints?  Are there
other issues to consider or avoid?

In the brief discussion on the meta-sig, several related projects were
mentioned.  It would be helpful to capture a brief summary of each on
the SIG web pages.

I'll follow up with my into soon.  

Jeremy



From jacobs@penguin.theopalgroup.com  Tue Jul  9 20:02:22 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Tue, 9 Jul 2002 15:02:22 -0400 (EDT)
Subject: [Persistence-sig] getting started
In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <Pine.LNX.4.44.0207091457360.7841-100000@penguin.theopalgroup.com>

Hi Jeremy and other persistent folk,

My primary interest has to do with developing high performance
enterprise-objects and object-relational mapping systems using new-style
Python class features.  A secondary interest involves distributed
transaction management frameworks, and heterogeneous backing stores.

I plan to devote a significant amount of my own time, as well as that of my
development team, to propose standards and produce reference implementations
of ideas developed here.

Looking forward to the future,
-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From pobrien@orbtech.com  Tue Jul  9 20:15:54 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Tue, 9 Jul 2002 14:15:54 -0500
Subject: [Persistence-sig] getting started
In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBEEGANHAA.pobrien@orbtech.com>

[Jeremy Hylton]
>
> It looks like many of the people who expressed interest in the SIG
> have subscribed to the list, so we ought to get started.  I think we
> should begin with some introductions and a review of the SIG charter.
>
> Introductions: Please tell us about your interest in the persistence
> SIG, what personal/professional goals you have for it, and how much
> time & energy you have.  (Feel free to lurk if that's your preference.)

Sounds good to me. I've been programming in Python for about a year and a
half now, and various other languages for the past 15 years. I've also done
a lot of work with relational databases and data modeling. I'm the author of
PyCrust (a Python shell written in wxPython) and a developer on the
PythonCard project (an app building framework for wxPython). I've created a
couple of websites with Quixote and various utilities with Python. I'm also
in the middle of creating an xhtml-compliant html generator similar to
htmlgen.

My interest in this SIG is directly related to my interest in using ZODB
outside of Zope to create medium-sized applications with persistent objects
instead of a traditional relational database approach using SQL. I think
ZODB is very good, but more could be done to make it easier to use by
someone familiar with relational databases. Along those lines I started a
project to make ZODB easier, called Bulldozer, and you can get the code from
SourceForge at http://sourceforge.net/projects/bdoz. There isn't any
documentation so the only clues to my intent are in the source and unit
tests and this wiki page at http://www.orbtech.com/wiki/BullDozer.
Unfortunately, I haven't had the time or energy to make any progress on this
project for the past couple of months.

I think there are lots of applications that need persistent data but don't
necessarily need or benefit from a relational database. There is also much
to be gained from not having to translate objects into relational tuples and
back again. And I think have good persistence support in the Python core
would be a really good selling point for Python. Transparent persistence is
also a hot item on the PythonCard project right now.

I've got a lot of interest in this topic so I'll do my best to make time
available. I'm on vacation all next week and then I'll be at OSCON. It would
be great to discuss this topic in person there. Does anyone plan to set up a
Birds-of-a-feather session at OSCON?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From guido@python.org  Tue Jul  9 20:40:34 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 09 Jul 2002 15:40:34 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: Your message of "Tue, 09 Jul 2002 14:15:54 CDT."
             <NBBBIOJPGKJEKIECEMCBEEGANHAA.pobrien@orbtech.com> 
References: <NBBBIOJPGKJEKIECEMCBEEGANHAA.pobrien@orbtech.com> 
Message-ID: <200207091940.g69JeYw03746@odiug.zope.com>

> I've got a lot of interest in this topic so I'll do my best to make time
> available. I'm on vacation all next week and then I'll be at OSCON. It would
> be great to discuss this topic in person there. Does anyone plan to set up a
> Birds-of-a-feather session at OSCON?

Could you set up this BOF?  I think it would be a good idea.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pje@telecommunity.com  Tue Jul  9 21:18:13 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 09 Jul 2002 16:18:13 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <Pine.LNX.4.44.0207091457360.7841-100000@penguin.theopalgro
 up.com>
References: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <3.0.5.32.20020709161813.01aa9d10@telecommunity.com>

At 03:02 PM 7/9/02 -0400, Kevin Jacobs wrote:
>
>My primary interest has to do with developing high performance
>enterprise-objects and object-relational mapping systems using new-style
>Python class features.  A secondary interest involves distributed
>transaction management frameworks, and heterogeneous backing stores.
>
>I plan to devote a significant amount of my own time, as well as that of my
>development team, to propose standards and produce reference implementations
>of ideas developed here.
>

I think I can safely say, "me too", on all of the above.  :)



From pobrien@orbtech.com  Tue Jul  9 21:41:17 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Tue, 9 Jul 2002 15:41:17 -0500
Subject: [Persistence-sig] getting started
In-Reply-To: <200207091940.g69JeYw03746@odiug.zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBEEGHNHAA.pobrien@orbtech.com>

> > Birds-of-a-feather session at OSCON?
> 
> Could you set up this BOF?  I think it would be a good idea.
> 
> --Guido van Rossum (home page: http://www.python.org/~guido/)

Done. I'll let you know when I hear back from Gretchen at O'Reilly.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/ 
Blog: http://www.orbtech.com/blog/pobrien/ 
Wiki: http://www.orbtech.com/wiki/PatrickOBrien 
-----------------------------------------------



From bzimmer@ziclix.com  Wed Jul 10 04:16:37 2002
From: bzimmer@ziclix.com (brian zimmer)
Date: Tue, 9 Jul 2002 22:16:37 -0500
Subject: [Persistence-sig] getting started
In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <002001c227c0$36ed7150$6401a8c0@mountain>

Hi all,

I am primarily interested in relational databases, OR mappings and
distributed transactions.  As the author of zxJDBC (the Jython
implementation of the DB API) I'm curious to see if anything proposed
here has large ramifications on Jython development.

thanks,

brian



From Sebastien.Bigaret@inqual.com  Wed Jul 10 08:17:11 2002
From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret)
Date: 10 Jul 2002 09:17:11 +0200
Subject: [Persistence-sig] getting started
In-Reply-To: "Phillip J. Eby"'s message of "Tue, 09 Jul 2002 16:18:13 -0400"
References: <15658.57844.17239.668311@slothrop.zope.com>
	<3.0.5.32.20020709161813.01aa9d10@telecommunity.com>
Message-ID: <87eleckq6g.fsf@bidibule.brest.inqual.bzh>


"Phillip J. Eby" <pje@telecommunity.com> writes:
> At 03:02 PM 7/9/02 -0400, Kevin Jacobs wrote:
> >
> >My primary interest has to do with developing high performance
> >enterprise-objects and object-relational mapping systems using new-s=
tyle
> >Python class features.  A secondary interest involves distributed
> >transaction management frameworks, and heterogeneous backing stores.
> >
> >I plan to devote a significant amount of my own time, as well as tha=
t of my
> >development team, to propose standards and produce reference impleme=
ntations
> >of ideas developed here.
> >
>=20
> I think I can safely say, "me too", on all of the above.  :)

So do I ;)

I would also add that I'm primarily interested in OR mapping, that part=
 of my
working time is actually dedicated to that subject ; I work on a projec=
t, a
framework dedicated to object/relational mapping [1], so I am already d=
evoting
time on the subject and will be pleased to participate in elaborating
standards, etc.


Jeremy> [SIG charter] Does that sound like the right set of initial
Jeremy> constraints?  Are there other issues to consider or avoid?

Ok for me.

-- S=E9bastien.


[1] soon to be open-sourced, when the last legal problems are eliminate=
d. I'll
    announce it here then --hopefully next week.



From pobrien@orbtech.com  Wed Jul 10 13:43:47 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Wed, 10 Jul 2002 07:43:47 -0500
Subject: [Persistence-sig] 
 FW: OSCON Birds of a Feather Session - confirmation
Message-ID: <NBBBIOJPGKJEKIECEMCBOEHLNHAA.pobrien@orbtech.com>

The OSCON BOF information appears below. Let me know if there are any
problems with the date and time that I requested. Otherwise, I'll see you
all there.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------

-----Original Message-----
From: Gretchen Bartholomew [mailto:gretchen@oreilly.com]
Sent: Tuesday, July 09, 2002 9:28 PM
To: pobrien@orbtech.com
Cc: Vee McMillen; gretchen@oreilly.com
Subject: OSCON Birds of a Feather Session - confirmation


Dear Mr. O'Brien:

Thank you for submitting a proposal to moderate a Birds of a Feather session
(BOF) at the upcoming O'Reilly Open Source Convention  --  July 22 - 26,
2002 in San Diego, CA.

Your BOF proposal has been accepted and I would like to schedule your BOF
for the following date/times. Please let me know if you have any conflicts
with this itinerary.

======

Title:          Python Persistence
Date:           Thursday, July 25
Time:           8:00 - 10:00 pm
Location:       Grande Ballroom C
Moderator:      Patrick O'Brien, Orbtech
Summary:        A Python Persistence Special Interest Group was recently
formed to explore ways to add basic persistence and transaction mechanisms
into the core of Python to avoid duplication of effort by a variety of
projects that have similar issues. This BOF will permit participants to
ponder Python persistence in person.

========

The BOF session information, as seen above, will be posted on the conference
BOF page:
http://conferences.oreillynet.com/pub/w/15/bof.html

Audio/visual equipment and A/V support is not supplied by O'Reilly &
Associates for BOF sessions.

If you have any questions or concerns, please do not hesitate to contact me.
We look forward to seeing you in San Diego.

Kind Regards,

Gretchen



Gretchen Bartholomew
Conf. Planning Coordinator
O'Reilly & Associates

Phone: 707-827-7186
Fax: 707-823-9746


============
O'Reilly Open Source Convention
Sheraton San Diego Hotel & Marina
July 22 - 26, 2002  --  San Diego, CA

http://conferences.oreilly.com/oscon/
============

============
O'Reilly Mac OS X Conference
Westin Santa Clara
Sept. 30 - Oct. 3, 2002  --  Santa Clara, CA

http://conferences.oreillynet.com/macosx2002/
============



From jim@zope.com  Wed Jul 10 13:55:03 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 08:55:03 -0400
Subject: [Persistence-sig] getting started
References: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <3D2C2EA7.6050502@zope.com>

Jeremy Hylton wrote:

...
> Introductions: Please tell us about your interest in the persistence
> SIG, what personal/professional goals you have for it,

This is pretty much covered in the SIG charter. :)

 > and how much
> time & energy you have.  (Feel free to lurk if that's your preference.)

Not as much as I'd like, but I will try to make time. Fortunately,
Jeremy and others at Python Labs are involved.

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From jim@zope.com  Wed Jul 10 14:05:57 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 09:05:57 -0400
Subject: [Persistence-sig]  FW: OSCON Birds of a Feather Session -
	confirmation
References: <NBBBIOJPGKJEKIECEMCBOEHLNHAA.pobrien@orbtech.com>
Message-ID: <3D2C3135.4070401@zope.com>

Is there any chance we could move this to Wednesday?

I'm leaving Thursday morning. :(

Jim

Patrick K. O'Brien wrote:
> The OSCON BOF information appears below. Let me know if there are any
> problems with the date and time that I requested. Otherwise, I'll see you
> all there.
> 
> --
> Patrick K. O'Brien
> Orbtech
> -----------------------------------------------
> "Your source for Python software development."
> -----------------------------------------------
> Web:  http://www.orbtech.com/web/pobrien/
> Blog: http://www.orbtech.com/blog/pobrien/
> Wiki: http://www.orbtech.com/wiki/PatrickOBrien
> -----------------------------------------------
> 
> -----Original Message-----
> From: Gretchen Bartholomew [mailto:gretchen@oreilly.com]
> Sent: Tuesday, July 09, 2002 9:28 PM
> To: pobrien@orbtech.com
> Cc: Vee McMillen; gretchen@oreilly.com
> Subject: OSCON Birds of a Feather Session - confirmation
> 
> 
> Dear Mr. O'Brien:
> 
> Thank you for submitting a proposal to moderate a Birds of a Feather session
> (BOF) at the upcoming O'Reilly Open Source Convention  --  July 22 - 26,
> 2002 in San Diego, CA.
> 
> Your BOF proposal has been accepted and I would like to schedule your BOF
> for the following date/times. Please let me know if you have any conflicts
> with this itinerary.
> 
> ======
> 
> Title:          Python Persistence
> Date:           Thursday, July 25
> Time:           8:00 - 10:00 pm
> Location:       Grande Ballroom C
> Moderator:      Patrick O'Brien, Orbtech
> Summary:        A Python Persistence Special Interest Group was recently
> formed to explore ways to add basic persistence and transaction mechanisms
> into the core of Python to avoid duplication of effort by a variety of
> projects that have similar issues. This BOF will permit participants to
> ponder Python persistence in person.
> 
> ========
> 
> The BOF session information, as seen above, will be posted on the conference
> BOF page:
> http://conferences.oreillynet.com/pub/w/15/bof.html
> 
> Audio/visual equipment and A/V support is not supplied by O'Reilly &
> Associates for BOF sessions.
> 
> If you have any questions or concerns, please do not hesitate to contact me.
> We look forward to seeing you in San Diego.
> 
> Kind Regards,
> 
> Gretchen
> 
> 
> 
> Gretchen Bartholomew
> Conf. Planning Coordinator
> O'Reilly & Associates
> 
> Phone: 707-827-7186
> Fax: 707-823-9746
> 
> 
> ============
> O'Reilly Open Source Convention
> Sheraton San Diego Hotel & Marina
> July 22 - 26, 2002  --  San Diego, CA
> 
> http://conferences.oreilly.com/oscon/
> ============
> 
> ============
> O'Reilly Mac OS X Conference
> Westin Santa Clara
> Sept. 30 - Oct. 3, 2002  --  Santa Clara, CA
> 
> http://conferences.oreillynet.com/macosx2002/
> ============
> 
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
> http://mail.python.org/mailman-21/listinfo/persistence-sig
> 



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From pobrien@orbtech.com  Wed Jul 10 14:19:32 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Wed, 10 Jul 2002 08:19:32 -0500
Subject: [Persistence-sig]  FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: <3D2C3135.4070401@zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBGEHNNHAA.pobrien@orbtech.com>

[Jim Fulton]
>
> Is there any chance we could move this to Wednesday?
>
> I'm leaving Thursday morning. :(

The only timeslot on Wednesday is from 8 to 10 and that is taken up by:

Python Software Foundation
Date: 07/24/2002
Time: 8:00pm - 10:00pm
Location: Marina II in the East Tower
Moderated by: Guido van Rossum

The only timeslot on Tuesday is from 6 to 7 and that is taken up by:

What is Python?
Date: 07/23/2002
Time: 6:00pm - 7:00pm
Location: Harbor Island I in the East Tower
Moderated by: Wesley J. Chun, CyberWeb Consulting

Monday has no conflicts for the entire timeslot from 6 to 10, but I don't
fly in until Tuesday and I wasn't sure if many people would be there on
Monday. That's why I picked Thursday.

I'm open to suggestions. It would be a shame not to have you there.
Thoughts?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From guido@python.org  Wed Jul 10 14:20:55 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 09:20:55 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: Your message of "Wed, 10 Jul 2002 09:05:57 EDT."
             <3D2C3135.4070401@zope.com> 
References: <NBBBIOJPGKJEKIECEMCBOEHLNHAA.pobrien@orbtech.com>  
            <3D2C3135.4070401@zope.com> 
Message-ID: <200207101320.g6ADKtH25999@pcp02138704pcs.reston01.va.comcast.net>

> Is there any chance we could move this to Wednesday?
> 
> I'm leaving Thursday morning. :(

I'd prefer Wednesday too -- Thursday night I have an OSI board meeting
to attend.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Wed Jul 10 14:22:37 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 09:22:37 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: Your message of "Wed, 10 Jul 2002 08:19:32 CDT."
             <NBBBIOJPGKJEKIECEMCBGEHNNHAA.pobrien@orbtech.com> 
References: <NBBBIOJPGKJEKIECEMCBGEHNNHAA.pobrien@orbtech.com> 
Message-ID: <200207101322.g6ADMbn26017@pcp02138704pcs.reston01.va.comcast.net>

> The only timeslot on Wednesday is from 8 to 10 and that is taken up by:
> 
> Python Software Foundation
> Date: 07/24/2002
> Time: 8:00pm - 10:00pm
> Location: Marina II in the East Tower
> Moderated by: Guido van Rossum

Oops, I forgot.  Strike Wednesday, too.

> The only timeslot on Tuesday is from 6 to 7 and that is taken up by:
> 
> What is Python?
> Date: 07/23/2002
> Time: 6:00pm - 7:00pm
> Location: Harbor Island I in the East Tower
> Moderated by: Wesley J. Chun, CyberWeb Consulting

We can overlap with this -- Wesley's BOF is for absolute beginners,
ours for dyed-in-the-wool developers.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pobrien@orbtech.com  Wed Jul 10 14:27:46 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Wed, 10 Jul 2002 08:27:46 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: <200207101322.g6ADMbn26017@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <NBBBIOJPGKJEKIECEMCBAEHONHAA.pobrien@orbtech.com>

[Guido van Rossum]
> > The only timeslot on Tuesday is from 6 to 7 and that is taken up by:
> > 
> > What is Python?
> > Date: 07/23/2002
> > Time: 6:00pm - 7:00pm
> > Location: Harbor Island I in the East Tower
> > Moderated by: Wesley J. Chun, CyberWeb Consulting
> 
> We can overlap with this -- Wesley's BOF is for absolute beginners,
> ours for dyed-in-the-wool developers.

Does Tuesday work for you, Jim?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/ 
Blog: http://www.orbtech.com/blog/pobrien/ 
Wiki: http://www.orbtech.com/wiki/PatrickOBrien 
-----------------------------------------------



From jim@zope.com  Wed Jul 10 15:05:31 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 10:05:31 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
References: <NBBBIOJPGKJEKIECEMCBGEHNNHAA.pobrien@orbtech.com>
	<200207101322.g6ADMbn26017@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D2C3F2B.8060808@zope.com>

Guido van Rossum wrote:
>>The only timeslot on Wednesday is from 8 to 10 and that is taken up by:
>>
>>Python Software Foundation
>>Date: 07/24/2002
>>Time: 8:00pm - 10:00pm
>>Location: Marina II in the East Tower
>>Moderated by: Guido van Rossum
>>
> 
> Oops, I forgot.  Strike Wednesday, too.
> 
> 
>>The only timeslot on Tuesday is from 6 to 7 and that is taken up by:
>>
>>What is Python?
>>Date: 07/23/2002
>>Time: 6:00pm - 7:00pm
>>Location: Harbor Island I in the East Tower
>>Moderated by: Wesley J. Chun, CyberWeb Consulting
>>
> 
> We can overlap with this -- Wesley's BOF is for absolute beginners,
> ours for dyed-in-the-wool developers.

Unfortunately, I don't arrive at the SD airport till 8pm.

I chose to keep my time as OSCON short this year. :(

I guess I'll just miss the BOF. Dang.

I suggest you go ahead with Thursday evening.

I'll see how hard it would be to extend my stay another day.

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From donnalcwalter@yahoo.com  Wed Jul 10 15:24:24 2002
From: donnalcwalter@yahoo.com (Donnal Walter)
Date: Wed, 10 Jul 2002 07:24:24 -0700 (PDT)
Subject: [Persistence-sig] getting started
In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <20020710142424.32998.qmail@web13901.mail.yahoo.com>

[Jeremy Hylton]
> Introductions: Please tell us about your interest in the
> persistence SIG, what personal/professional goals you have for
> it, and how much time & energy you have.  (Feel free to lurk if
> that's your preference.)

Programming is an avocation for me (I'm an academic physician) so I
am sure I will be lurking mostly. But the custom clinical apps on
which I have been working all require persistent data, so I will be
following these proceedings with interest. I have little expertise,
some time, and lots of energy. :-)

=====
Donnal Walter
Arkansas Children's Hospital

__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com


From sdrees@sdrees2.de  Wed Jul 10 15:40:54 2002
From: sdrees@sdrees2.de (Stefan Drees)
Date: Wed, 10 Jul 2002 16:40:54 +0200
Subject: [Persistence-sig] getting started
In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>;
	from jeremy@zope.com on Tue, Jul 09, 2002 at 09:15:32AM -0400
References: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <20020710164054.B14438@sdrees2.de>

On Tue, Jul 09, 2002 at 09:15:32AM -0400 - a wonderful day - 
					Jeremy Hylton wrote:
> ... 
> Introductions: Please tell us about your interest in the
> persistence SIG, what personal/professional goals you have for
> it, and how much time & energy you have.  (Feel free to lurk if
> that's your preference.)

I've been programming and consulting since 1989.  For now I am
sure I will be lurking first.  But a standardized persistency 
layer in python seems - at least to me - to be an important 
feature for python to stay competitive.  So I will be following 
these discussions and hopefully the coding with interest and 
some participation, I guess.  I do have some expertise, well at 
least some time, and energy.

All the best,
		s t e f a n.
-- 
Stefan Drees, sdrees@acm.org.


From guido@python.org  Wed Jul 10 15:56:41 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 10:56:41 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: Your message of "Wed, 10 Jul 2002 16:40:54 +0200."
             <20020710164054.B14438@sdrees2.de> 
References: <15658.57844.17239.668311@slothrop.zope.com>  
            <20020710164054.B14438@sdrees2.de> 
Message-ID: <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>

> But a standardized persistency layer in python seems - at least to
> me - to be an important feature for python to stay competitive.

What is the competition doing in this area?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From sdrees@sdrees2.de  Wed Jul 10 16:54:43 2002
From: sdrees@sdrees2.de (Stefan Drees)
Date: Wed, 10 Jul 2002 17:54:43 +0200
Subject: [Persistence-sig] getting started
In-Reply-To: 
	<200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>; from
	guido@python.org on Wed, Jul 10, 2002 at 10:56:41AM -0400
References: <15658.57844.17239.668311@slothrop.zope.com>
	<20020710164054.B14438@sdrees2.de>
	<200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020710175443.A15365@sdrees2.de>

On Wed, Jul 10, 2002 at 10:56:41AM -0400 - a wonderful day - 
					Guido van Rossum wrote:
> > But a standardized persistency layer in python seems - at 
> > least to me - to be an important feature for python to stay 
> > competitive.
> What is the competition doing in this area?
Hm, nothing I'm aware of, but that's the point: staying ahead 
in some important areas just helps, doesn't it?
 
All the best,
		s t e f a n.
-- 
Stefan Drees, sdrees@acm.org.


From guido@python.org  Wed Jul 10 17:00:42 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 12:00:42 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: Your message of "Wed, 10 Jul 2002 17:54:43 +0200."
             <20020710175443.A15365@sdrees2.de> 
References: <15658.57844.17239.668311@slothrop.zope.com>
	<20020710164054.B14438@sdrees2.de>
	<200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>  
            <20020710175443.A15365@sdrees2.de> 
Message-ID: <200207101600.g6AG0gY26597@pcp02138704pcs.reston01.va.comcast.net>

> > > But a standardized persistency layer in python seems - at 
> > > least to me - to be an important feature for python to stay 
> > > competitive.
> > What is the competition doing in this area?
> Hm, nothing I'm aware of, but that's the point: staying ahead 
> in some important areas just helps, doesn't it?

I dunno.  I personally believe there's a reason why few languages
standardize persistence, and why languages that do include persistence
have remained at the fringe at best.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From smenard@bigfoot.com  Wed Jul 10 17:31:31 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Wed, 10 Jul 2002 12:31:31 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <200207101600.g6AG0gY26597@pcp02138704pcs.reston01.va.comca
 st.net>
References: <Your message of "Wed, 10 Jul 2002 17:54:43 +0200."
	<20020710175443.A15365@sdrees2.de>
 <15658.57844.17239.668311@slothrop.zope.com>
 <20020710164054.B14438@sdrees2.de>
 <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>
 <20020710175443.A15365@sdrees2.de>
Message-ID: <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca>

At 12:00 PM 7/10/2002 -0400, Guido van Rossum wrote:
> > > > But a standardized persistency layer in python seems - at
> > > > least to me - to be an important feature for python to stay
> > > > competitive.
> > > What is the competition doing in this area?
> > Hm, nothing I'm aware of, but that's the point: staying ahead
> > in some important areas just helps, doesn't it?
>
>I dunno.  I personally believe there's a reason why few languages
>standardize persistence, and why languages that do include persistence
>have remained at the fringe at best.
>
>--Guido van Rossum (home page: http://www.python.org/~guido/)

Could you elaborate on why you believe so?

I know the technical hurdles will not be insignificant, and we have to be 
careful not to try to come up with "THE ONE TRUE SOLUTION" that would be 
supposed to solve everyone's problems. Personally, something like ZOPE, 
with a few enhancements and guaranteed to work on any platform (read 
pure-python), would go a LONG way ion the right direction.

More static languages like C++, Java, Eiffel etc.. will naturally have a 
harder time creating versatile and easy to use persistence. That's where 
python's dynamic nature should help us.


         Steve



From jmillr@umich.edu  Wed Jul 10 17:42:13 2002
From: jmillr@umich.edu (John Miller)
Date: Wed, 10 Jul 2002 12:42:13 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <E17SJsr-00066s-00@mail.python.org>
Message-ID: <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>

Like others, I expect mainly to lurk. I would appreciate it if someone 
were willing to explain how the goals of this sig go beyond pickling and 
shelving. I know that this sounds like a newbie question, (which, in 
most respects, I am) but it would help to make explicit the context for 
the ensuing discussion. Since Python already incorporates persistence 
via pickling and shelving, what is currently lacking? (I know that the 
answer is probably obvious to most people on this list.) In other words, 
quickly describe the difference between pickling and shelving, describe 
how ZODB incorporates one or the other or both, and why or why not 
extending pickling and/or shelving themselves is a wise move to achieve 
the goals of this sig. Thanks in advance to anyone willing to lay the 
groundwork in this or a similar fashion for us developers-in-training!

John Miller
School of Education
University of Michigan

>>>> But a standardized persistency layer in python seems - at
>>>> least to me - to be an important feature for python to stay
>>>> competitive.
>>> What is the competition doing in this area?
>> Hm, nothing I'm aware of, but that's the point: staying ahead
>> in some important areas just helps, doesn't it?
>
> I dunno.  I personally believe there's a reason why few languages
> standardize persistence, and why languages that do include persistence
> have remained at the fringe at best.
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)



From jim@zope.com  Wed Jul 10 18:01:00 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 13:01:00 -0400
Subject: [Persistence-sig] getting started
References: <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>
Message-ID: <3D2C684C.9060307@zope.com>

John Miller wrote:
> Like others, I expect mainly to lurk. I would appreciate it if someone 
> were willing to explain how the goals of this sig go beyond pickling and 
> shelving. I know that this sounds like a newbie question, (which, in 
> most respects, I am) but it would help to make explicit the context for 
> the ensuing discussion. Since Python already incorporates persistence 
> via pickling and shelving, what is currently lacking? (I know that the 
> answer is probably obvious to most people on this list.) In other words, 
> quickly describe the difference between pickling and shelving, describe 
> how ZODB incorporates one or the other or both, and why or why not 
> extending pickling and/or shelving themselves is a wise move to achieve 
> the goals of this sig. Thanks in advance to anyone willing to lay the 
> groundwork in this or a similar fashion for us developers-in-training!

Pickling and shelving:

- Are not transactional

- Are not transparent

   The application must explicitly load and save objects, must track
   object changes, and must manage when objects are and are not in memory.

- Do not work with relational data.

The proposed frameworks would provide a common *basis* (not
solution) for transparent transactional persistence, both for object
databases, like ZODB and for object-relational mapping frameworks.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From jacobs@penguin.theopalgroup.com  Wed Jul 10 18:27:53 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Wed, 10 Jul 2002 13:27:53 -0400 (EDT)
Subject: [Persistence-sig] getting started
In-Reply-To: <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>
Message-ID: <Pine.LNX.4.44.0207101301120.21470-100000@penguin.theopalgroup.com>

On Wed, 10 Jul 2002, John Miller wrote:
> Like others, I expect mainly to lurk. I would appreciate it if someone 
> were willing to explain how the goals of this sig go beyond pickling and 
> shelving.

The major reasons why things are more complex than shelve or pickle are due
to the requirements of more sophisticated back-end data storage mechanisms.
For one, the backend data store may need to be interoperable with other
systems; i.e., relational or object database backends.  Also, the backend
store may be very large, so that loading and updating objects need to be
done incrementally, efficiently, and safely.

Here are some articles/debates about Java's persistent data objects that are
useful, even if you don't agree with them:

  http://www.onjava.com/pub/a/onjava/2002/05/29/jdo.html
  http://www.onjava.com/pub/a/onjava/2002/04/10/jdbc.html

Here is a partial taxonomy of issues I'd like to see addressed.  You'll
notice that many of them are somewhat specific to fixed schema persistent
backends, like some object-relational (OR) systems:

  1) Extensible bi-directional type mapping

     i.e., systems for mapping types from Python to an RDBMS and back in a
           lossless fashion.

  2) Manual Schema specification vs. automatic schema introspection

     i.e., the ability to construct sensible objects from relations depends
           on the schema, but also other information like foreign keys,
           constraints, and possibly other meta-data not available from the
           backend.  Some OR systems require differing amounts of
           user-specified schema information to build appropriate objects.

  3) Foreign key referenced object instantiation

     i.e., how and when to instantiate new objects from attributes another
           object with attributes that can act as foreign keys.

  4) Transactional scoping of object updates

     i.e., multiple OR-mapped objects can be queried from distinct
           transactions, then referentially linked together.  This opens the
           door to several rather nasty situations, some of which can be
           handled, others must be explicitly disallowed.

  5) Systems for tracking of uncommited object updates.

  6) Query language abstraction for building OR frameworks.

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From guido@python.org  Wed Jul 10 19:31:12 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 14:31:12 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: Your message of "Wed, 10 Jul 2002 12:31:31 EDT."
             <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> 
References: <Your message of "Wed, 10 Jul 2002 17:54:43 +0200."
	<20020710175443.A15365@sdrees2.de>
	<15658.57844.17239.668311@slothrop.zope.com>
	<20020710164054.B14438@sdrees2.de>
	<200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>
	<20020710175443.A15365@sdrees2.de>  
            <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca> 
Message-ID: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comcast.net>

> >I dunno.  I personally believe there's a reason why few languages
> >standardize persistence, and why languages that do include persistence
> >have remained at the fringe at best.
> >
> >--Guido van Rossum (home page: http://www.python.org/~guido/)
> 
> Could you elaborate on why you believe so?
> 
> I know the technical hurdles will not be insignificant, and we have to be 
> careful not to try to come up with "THE ONE TRUE SOLUTION" that would be 
> supposed to solve everyone's problems. Personally, something like ZOPE, 
> with a few enhancements and guaranteed to work on any platform (read 
> pure-python), would go a LONG way ion the right direction.

Kevin Jacobs's posts here are an example of what I mean.  He wants to
map objects to relational databases, which is very different from
Zope.  Coming up with something that supports both sounds hard.

> More static languages like C++, Java, Eiffel etc.. will naturally have a 
> harder time creating versatile and easy to use persistence. That's where 
> python's dynamic nature should help us.

I don't know why you think that.  As long as a language has the
metadata describing its types at run-time, it should have no problem.
At least Java and (modern) C++ satisfy this condition; I don't know
enough about Eiffel but I'd bet that it also has considerable run-time
accessible meta-data.

--Guido van Rossum (home page: http://www.python.org/~guido/)



From jacobs@penguin.theopalgroup.com  Wed Jul 10 19:35:03 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Wed, 10 Jul 2002 14:35:03 -0400 (EDT)
Subject: [Persistence-sig] getting started
In-Reply-To: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <Pine.LNX.4.44.0207101433130.1661-100000@penguin.theopalgroup.com>

On Wed, 10 Jul 2002, Guido van Rossum wrote:
> Kevin Jacobs's posts here are an example of what I mean.  He wants to
> map objects to relational databases, which is very different from
> Zope.  Coming up with something that supports both sounds hard.

It is different than ZODB, and it will be hard to come up with an
implementation that supports both paradigms.  However, I am much more
concerned with interface at the moment, so I still think there is much
useful work that can be done here that applies to both.

-Kevin ;)

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From smenard@bigfoot.com  Wed Jul 10 19:50:41 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Wed, 10 Jul 2002 14:50:41 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comca
 st.net>
References: <Your message of "Wed, 10 Jul 2002 12:31:31 EDT."
	<5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca>
 <Your message of "Wed, 10 Jul 2002 17:54:43 +0200."
	<20020710175443.A15365@sdrees2.de>
 <15658.57844.17239.668311@slothrop.zope.com>
 <20020710164054.B14438@sdrees2.de>
 <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>
 <20020710175443.A15365@sdrees2.de>
 <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca>
Message-ID: <5.1.0.14.0.20020710144606.02aa5130@pop.videotron.ca>

At 02:31 PM 7/10/2002 -0400, Guido van Rossum wrote:
> > >I dunno.  I personally believe there's a reason why few languages
> > >standardize persistence, and why languages that do include persistence
> > >have remained at the fringe at best.
> > >
> > >--Guido van Rossum (home page: http://www.python.org/~guido/)
> >
> > Could you elaborate on why you believe so?
> >
> > I know the technical hurdles will not be insignificant, and we have to be
> > careful not to try to come up with "THE ONE TRUE SOLUTION" that would be
> > supposed to solve everyone's problems. Personally, something like ZOPE,
> > with a few enhancements and guaranteed to work on any platform (read
> > pure-python), would go a LONG way ion the right direction.
>
>Kevin Jacobs's posts here are an example of what I mean.  He wants to
>map objects to relational databases, which is very different from
>Zope.  Coming up with something that supports both sounds hard.

Yep. I can't help but agree on this. I think its possible to come up with 
a  common public interface for both mechanism. However, I doubt an object 
built for one model can be reused as-is in a different model.

My personal interest in this is more to ZODB-like functionality becoming 
standard than other more enterprise-oriented solutions (like OR mappings 
seem to be).

> > More static languages like C++, Java, Eiffel etc.. will naturally have a
> > harder time creating versatile and easy to use persistence. That's where
> > python's dynamic nature should help us.
>
>I don't know why you think that.  As long as a language has the
>metadata describing its types at run-time, it should have no problem.
>At least Java and (modern) C++ satisfy this condition; I don't know
>enough about Eiffel but I'd bet that it also has considerable run-time
>accessible meta-data.

It's the cost of accessing that information that makes it harder. I have 
worked on a few Java persistence prototypes, and I have never come up with 
something satisfactory. Tracking changes is hard (because we can't trap the 
setattr), getting/setting values is hard (access protecttion being the 
chief culprit), etc...


         Steve



From smenard@bigfoot.com  Wed Jul 10 19:58:00 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Wed, 10 Jul 2002 14:58:00 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <15658.57844.17239.668311@slothrop.zope.com>
Message-ID: <5.1.0.14.0.20020710145101.080e0dd0@pop.videotron.ca>

At 09:15 AM 7/9/2002 -0400, Jeremy Hylton wrote:
>It looks like many of the people who expressed interest in the SIG
>have subscribed to the list, so we ought to get started.  I think we
>should begin with some introductions and a review of the SIG charter.
>
>Introductions: Please tell us about your interest in the persistence
>SIG, what personal/professional goals you have for it, and how much
>time & energy you have.  (Feel free to lurk if that's your preference.)

Well, better late than never.

I am professional programmer. I've been using python on and off for a few 
years now. My main interest in this is so I can get ZODB-like functionality 
without too much fuss.

I currently do not have a lot of time to devote, but as I will be in 
recovery all of september, I can put some time into coding/testing. 
Additonally, since I have a few projects waiting on just such functionality 
(they are why I started POD http://www.sourceforge.net/projects/pypod ), I 
will certainly be in a position to use whatever comes out of the SIG.


>Charter: Jim Fulton wrote the SIG charter.  A very brief summary is
>that we should:
>
>   - focus on transparency, transactions, and memory-caching issues;
>
>   - put off concurrency control, queries, and constraints;
>
>   - produce PEPs and, if there is consensus, code for the std library.

One small comment. Perhaps the single missing feature of ZODB (besides not 
running on Python 2.2) is a query language. Furthermore, without adequate 
support from the Storage classes, such a language will be very difficult to 
tack on afterward.

         Steve




From jim@zope.com  Wed Jul 10 20:16:10 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 15:16:10 -0400
Subject: [Persistence-sig] getting started
References: <Your message of "Wed, 10 Jul 2002 17:54:43 +0200."
	<20020710175443.A15365@sdrees2.de>
	<15658.57844.17239.668311@slothrop.zope.com>
	<20020710164054.B14438@sdrees2.de>
	<200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>
	<20020710175443.A15365@sdrees2.de>
	<5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca>
	<200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D2C87FA.301@zope.com>

Guido van Rossum wrote:
>>>I dunno.  I personally believe there's a reason why few languages
>>>standardize persistence, and why languages that do include persistence
>>>have remained at the fringe at best.
>>>
>>>--Guido van Rossum (home page: http://www.python.org/~guido/)
>>>
>>Could you elaborate on why you believe so?
>>
>>I know the technical hurdles will not be insignificant, and we have to be 
>>careful not to try to come up with "THE ONE TRUE SOLUTION" that would be 
>>supposed to solve everyone's problems. Personally, something like ZOPE, 
>>with a few enhancements and guaranteed to work on any platform (read 
>>pure-python), would go a LONG way ion the right direction.
>>
> 
> Kevin Jacobs's posts here are an example of what I mean.  He wants to
> map objects to relational databases, which is very different from
> Zope.  Coming up with something that supports both sounds hard.

Maybe, but understand that O-R mapping is not in the scope of the SIG.
Rather, basic persistence and transaction frameworks, that one could build
O-R mappings or object databases on top of are in scope. I'm hopeful that
we could come up with low-level frameworks that could serve both.

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From altis@semi-retired.com  Wed Jul 10 21:30:34 2002
From: altis@semi-retired.com (Kevin Altis)
Date: Wed, 10 Jul 2002 13:30:34 -0700
Subject: [Persistence-sig] getting started
Message-ID: <KJEOLDOPMIDKCMJDCNDPKEMDCCAA.altis@semi-retired.com>

I'm the lead for the PythonCard project
http://pythoncard.sourceforge.net/

I'll mostly be lurking and don't expect that I will contribute any code. I'm
not a database guy and I've only been using Python for a little over a year,
so all you data gurus are much more qualified than I to say what is good and
proper. However, if there is something usable that comes out of this SIG
then it is likely a PythonCard sample or two will get created that utilizes
the API/package. I'll go ahead and give a long introduction, to get it out
of the way and hopefully bring up some relevant topics.

Persistence in the context of PythonCard is probably a bit different than
what most people have in mind for this SIG. We don't even have complete
agreement among the main PythonCard developers on this topic.

I would like to have a storage solution that is built-in to the Python
standard distribution and that won't change in the next few years but
preferably won't change for 5-10 years or longer, so that there is little
risk of stored data becoming unusable as Python is updated. The data format
must also be cross-platform, at least for the major desktop platforms in
use, so that data created on one platform can be easily exchanged with a
user on another platform without the need for an explicit import/export.
This is where shelve falls down, unless you use dumddbm.

The storage format we end up using for PythonCard will be a basic document
type that any PythonCard app/Python script should be able to open and make
some sense of. Other storage formats will always be an option, but there
will be at least one well-defined format that any and all apps should
understand regardless of whether they are running on Windows, Mac OS X,
Linux/Unix. I'm mostly thinking of storing "dumb data" or simple types,
lists and dictionaries, so I'm not particularly concerned about being able
to store instances of complex classes and their member relations. Storing
class instances worries me because I expect some classes to change over time
and potentially break the loading of old data files created with different
versions of the classes.

Plain pickles seem to fit my requirements as long as you only use native
Python types, so that there are no dependencies on external classes and
modules when loading the pickle. A conversion of the data to a newer format
might be acceptable, but this implies some kind of versioning or other
smarts in the data file. The PythonCard flatfileDatabase sample uses a
simple list of dictionaries for storing data, keeping the entire data set in
memory while the app is running. The data can be loaded and stored as a
single pickle file (version in cvs, not release 0.6.7). I would have
preferred a solution where all the data didn't need to be in memory and the
access to each record in the list was transparent, but I ran into issues
trying to use shelve for this and we haven't gotten far enough along with
ZODB to know whether it will do the job.

A number of people working on PythonCard apps would be very happy if the
simple lists and dictionaries could be mapped to underlying SQL data stores
without the user of the storage needing to know anything about SQL.
Concurrency and transactions would be nice too.

I posted a message to the PythonCard-users mailing list about shelve at the
end of June that covers some of the issues I ran into with shelve.

"why we probably don't want to use shelve"
http://aspn.activestate.com/ASPN/Mail/Message/1259977

There are even more messages in the PythonCard-users archive about
persistence and pickle, but most of them only touch on issues this SIG will
address.

http://aspn.activestate.com/ASPN/Mail/Browse/Threaded/PythonCard

ka
---
Kevin Altis
altis@semi-retired.com
http://www.pythoncard.org/



From pje@telecommunity.com  Wed Jul 10 21:29:42 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 10 Jul 2002 16:29:42 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <Pine.LNX.4.44.0207101301120.21470-100000@penguin.theopalgr
 oup.com>
References: <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>
Message-ID: <3.0.5.32.20020710162942.00868100@telecommunity.com>

At 01:27 PM 7/10/02 -0400, Kevin Jacobs wrote:
>
>Here is a partial taxonomy of issues I'd like to see addressed.  You'll
>notice that many of them are somewhat specific to fixed schema persistent
>backends, like some object-relational (OR) systems:
>
>  1) Extensible bi-directional type mapping
>  2) Manual Schema specification vs. automatic schema introspection
>  3) Foreign key referenced object instantiation
>  4) Transactional scoping of object updates
>  5) Systems for tracking of uncommited object updates.
>  6) Query language abstraction for building OR frameworks.

These are all good points, but actually solving them is (IMHO) outside
scope for the SIG's mission.  What we want is basic support for:

1) "Transparent" persistence, for some value of "transparent".  A mechanism
to either specify that a class is intended to be persistent, or to
otherwise provide proxy or observer-style support to allow a persistence
*mechanism* to know when object states are changed or accessed, or about to
be changed or accessed, in order to do their thing.

2) Transaction framework/API, for some value of "framework/API".  Again,
this is also about mechanisms for registering, observing, or otherwise
notifying objects (or persistence mechanisms) about transaction participation.

This is actually a very narrow set of goals, ones that I think we have a
high degree of ability to achieve, if we stay focused on them, and how our
individual high-level requirements (such as you've described) are reflected
in these introspection/notification aspects.  If Python has common idioms
for dealing with these issues, then many persistence *mechanisms* can
co-exist and compete in their respective niches for what they can handle.
Perhaps multiple mechanisms might even be able to share the management of a
single object.

One reason that I believe these narrow goals are attainable, is that the
existing Zope "Persistence" and "Transaction" packages from ZODB4 *can* be
leveraged to build O-R and other mappings.  I know this, because I've
designed a framework atop the existing ZODB4 code base that can map
*anything* to or from *anything*.  Specific examples:

* Relational database
* XML/XMI in a file
* XML/XMI, persisted to a relational database
* XML/XMI, persisted in ZODB
* A relational database written using persistent objects for tables and
rows, stored in ZODB.  :)

Indeed, the design I have is of sufficient generality to persist any object
to any backend (where said backend may actually be another persistent
object, stored in yet another backend!), as long as:

1. All objects to be persisted subclass Persistence.Persistent.
2. All backends participate in the Transactions.Transaction framework.

(There is an additional restriction when dealing with backends which
themselves are stored in other backends, which is that the "outermost"
backends must support potentially commiting the same object more than once
during the tpc_begin->tpc_vote phase of transaction commit.)

(I should also note that when dealing with relational databases, my design
work addressed such matters as cache consistency, relational integrity
constraint ordering, multi-row queries, foreign key and inverse foreign key
relationships, etc., etc., ad nauseam.)

Anyway, the fact that these things can be done based solely on the existing
ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB"
package itself, means that what's available from ZODB is actually pretty
close to what's needed as a base mechanism.  It's more a question (to me,
anyway) of what could/should be improved, particularly in the form of how
the API calls and interfaces are phrased.

For my requirements, I'd be fine with it if we just put Persistent and
Transaction in the standard library, with better docs.  :)  But it'd be
nice if certain things were spelled differently, or a bit more flexible.



From pobrien@orbtech.com  Wed Jul 10 21:38:15 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Wed, 10 Jul 2002 15:38:15 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: <3D2C3F2B.8060808@zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBKEJBNHAA.pobrien@orbtech.com>

[Jim Fulton]
>
> Unfortunately, I don't arrive at the SD airport till 8pm.
>
> I chose to keep my time as OSCON short this year. :(
>
> I guess I'll just miss the BOF. Dang.
>
> I suggest you go ahead with Thursday evening.
>
> I'll see how hard it would be to extend my stay another day.

Unfortunately, Guido has another meeting Thursday evening. Would moving the
time up to 6:00 pm on Thursday help? I would think this BOF would be most
productive if we had the Pope, the BDFL and the SIG Coordinator all together
at the same time. But I'll take whatever I can get. :-)

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From jim@zope.com  Wed Jul 10 21:47:21 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 16:47:21 -0400
Subject: [Persistence-sig] getting started
References: <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>
	<3.0.5.32.20020710162942.00868100@telecommunity.com>
Message-ID: <3D2C9D59.2000203@zope.com>

Very well said.

Jim

Phillip J. Eby wrote:
> At 01:27 PM 7/10/02 -0400, Kevin Jacobs wrote:
> 
>>Here is a partial taxonomy of issues I'd like to see addressed.  You'll
>>notice that many of them are somewhat specific to fixed schema persistent
>>backends, like some object-relational (OR) systems:
>>
>> 1) Extensible bi-directional type mapping
>> 2) Manual Schema specification vs. automatic schema introspection
>> 3) Foreign key referenced object instantiation
>> 4) Transactional scoping of object updates
>> 5) Systems for tracking of uncommited object updates.
>> 6) Query language abstraction for building OR frameworks.
>>
> 
> These are all good points, but actually solving them is (IMHO) outside
> scope for the SIG's mission.  What we want is basic support for:
> 
> 1) "Transparent" persistence, for some value of "transparent".  A mechanism
> to either specify that a class is intended to be persistent, or to
> otherwise provide proxy or observer-style support to allow a persistence
> *mechanism* to know when object states are changed or accessed, or about to
> be changed or accessed, in order to do their thing.
> 
> 2) Transaction framework/API, for some value of "framework/API".  Again,
> this is also about mechanisms for registering, observing, or otherwise
> notifying objects (or persistence mechanisms) about transaction participation.
> 
> This is actually a very narrow set of goals, ones that I think we have a
> high degree of ability to achieve, if we stay focused on them, and how our
> individual high-level requirements (such as you've described) are reflected
> in these introspection/notification aspects.  If Python has common idioms
> for dealing with these issues, then many persistence *mechanisms* can
> co-exist and compete in their respective niches for what they can handle.
> Perhaps multiple mechanisms might even be able to share the management of a
> single object.
> 
> One reason that I believe these narrow goals are attainable, is that the
> existing Zope "Persistence" and "Transaction" packages from ZODB4 *can* be
> leveraged to build O-R and other mappings.  I know this, because I've
> designed a framework atop the existing ZODB4 code base that can map
> *anything* to or from *anything*.  Specific examples:
> 
> * Relational database
> * XML/XMI in a file
> * XML/XMI, persisted to a relational database
> * XML/XMI, persisted in ZODB
> * A relational database written using persistent objects for tables and
> rows, stored in ZODB.  :)
> 
> Indeed, the design I have is of sufficient generality to persist any object
> to any backend (where said backend may actually be another persistent
> object, stored in yet another backend!), as long as:
> 
> 1. All objects to be persisted subclass Persistence.Persistent.
> 2. All backends participate in the Transactions.Transaction framework.
> 
> (There is an additional restriction when dealing with backends which
> themselves are stored in other backends, which is that the "outermost"
> backends must support potentially commiting the same object more than once
> during the tpc_begin->tpc_vote phase of transaction commit.)
> 
> (I should also note that when dealing with relational databases, my design
> work addressed such matters as cache consistency, relational integrity
> constraint ordering, multi-row queries, foreign key and inverse foreign key
> relationships, etc., etc., ad nauseam.)
> 
> Anyway, the fact that these things can be done based solely on the existing
> ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB"
> package itself, means that what's available from ZODB is actually pretty
> close to what's needed as a base mechanism.  It's more a question (to me,
> anyway) of what could/should be improved, particularly in the form of how
> the API calls and interfaces are phrased.
> 
> For my requirements, I'd be fine with it if we just put Persistent and
> Transaction in the standard library, with better docs.  :)  But it'd be
> nice if certain things were spelled differently, or a bit more flexible.
> 
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
> http://mail.python.org/mailman-21/listinfo/persistence-sig
> 



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From guido@python.org  Wed Jul 10 22:00:33 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 17:00:33 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: Your message of "Wed, 10 Jul 2002 15:38:15 CDT."
             <NBBBIOJPGKJEKIECEMCBKEJBNHAA.pobrien@orbtech.com> 
References: <NBBBIOJPGKJEKIECEMCBKEJBNHAA.pobrien@orbtech.com> 
Message-ID: <200207102100.g6AL0Xw27884@pcp02138704pcs.reston01.va.comcast.net>

> Unfortunately, Guido has another meeting Thursday evening. Would
> moving the time up to 6:00 pm on Thursday help? I would think this
> BOF would be most productive if we had the Pope, the BDFL and the
> SIG Coordinator all together at the same time. But I'll take
> whatever I can get. :-)

Alas, the OSI board has a dinner meeting preceding the (open) board
meeting starting at 5:30 on Thu.  Perhaps we could try to do this over
lunch on Wed?  (Lunch Wed is booked too for me...)

Or we could pick a slot in the Python track that is unlikely to be of
interest for the persistence crowd.  I could miss the two Thursday
morning talks (weave and WRDLpy -- no offense intended).

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jim@zope.com  Wed Jul 10 22:08:41 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 17:08:41 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
References: <NBBBIOJPGKJEKIECEMCBKEJBNHAA.pobrien@orbtech.com>
	<200207102100.g6AL0Xw27884@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <3D2CA259.8040900@zope.com>

Guido van Rossum wrote:
>>Unfortunately, Guido has another meeting Thursday evening. Would
>>moving the time up to 6:00 pm on Thursday help? I would think this
>>BOF would be most productive if we had the Pope, the BDFL and the
>>SIG Coordinator all together at the same time. But I'll take
>>whatever I can get. :-)
>>
> 
> Alas, the OSI board has a dinner meeting preceding the (open) board
> meeting starting at 5:30 on Thu.  Perhaps we could try to do this over
> lunch on Wed?  (Lunch Wed is booked too for me...)
> 
> Or we could pick a slot in the Python track that is unlikely to be of
> interest for the persistence crowd.  I could miss the two Thursday
> morning talks (weave and WRDLpy -- no offense intended).

This last idea sounds good to me.

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From guido@python.org  Wed Jul 10 22:10:13 2002
From: guido@python.org (Guido van Rossum)
Date: Wed, 10 Jul 2002 17:10:13 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: Your message of "Wed, 10 Jul 2002 16:47:21 EDT."
             <3D2C9D59.2000203@zope.com> 
References: <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>
	<3.0.5.32.20020710162942.00868100@telecommunity.com>  
            <3D2C9D59.2000203@zope.com> 
Message-ID: <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comcast.net>

> Phillip J. Eby wrote:
[...]
> > Anyway, the fact that these things can be done based solely on the existing
> > ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB"
> > package itself, means that what's available from ZODB is actually pretty
> > close to what's needed as a base mechanism.  It's more a question (to me,
> > anyway) of what could/should be improved, particularly in the form of how
> > the API calls and interfaces are phrased.
> > 
> > For my requirements, I'd be fine with it if we just put Persistent and
> > Transaction in the standard library, with better docs.  :)  But it'd be
> > nice if certain things were spelled differently, or a bit more flexible.

This is a goal I can agree with.  Care to start a list of what
spellings you'd like to change?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From jim@zope.com  Wed Jul 10 21:52:39 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 10 Jul 2002 16:52:39 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
References: <NBBBIOJPGKJEKIECEMCBKEJBNHAA.pobrien@orbtech.com>
Message-ID: <3D2C9E97.8020608@zope.com>

Patrick K. O'Brien wrote:
> [Jim Fulton]
> 
>>Unfortunately, I don't arrive at the SD airport till 8pm.
>>
>>I chose to keep my time as OSCON short this year. :(
>>
>>I guess I'll just miss the BOF. Dang.
>>
>>I suggest you go ahead with Thursday evening.
>>
>>I'll see how hard it would be to extend my stay another day.
>>
> 
> Unfortunately, Guido has another meeting Thursday evening. Would moving the
> time up to 6:00 pm on Thursday help? I would think this BOF would be most
> productive if we had the Pope, the BDFL and the SIG Coordinator all together
> at the same time. But I'll take whatever I can get. :-)

I'm pretty sure that Jeremy isn't going to be there.

If earlier on Thursday can work, then I'll try to change my reservation.

Jim



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From pobrien@orbtech.com  Wed Jul 10 23:12:45 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Wed, 10 Jul 2002 17:12:45 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: <3D2CA259.8040900@zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBIEJHNHAA.pobrien@orbtech.com>

[Jim Fulton]
> > 
> > Or we could pick a slot in the Python track that is unlikely to be of
> > interest for the persistence crowd.  I could miss the two Thursday
> > morning talks (weave and WRDLpy -- no offense intended).
> 
> This last idea sounds good to me.

I'm waiting to hear back from O'Reilly to see if we can make this happen.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/ 
Blog: http://www.orbtech.com/blog/pobrien/ 
Wiki: http://www.orbtech.com/wiki/PatrickOBrien 
-----------------------------------------------



From jcw@equi4.com  Thu Jul 11 00:16:37 2002
From: jcw@equi4.com (Jean-Claude Wippler)
Date: Thu, 11 Jul 2002 01:16:37 +0200
Subject: [Persistence-sig] getting started
In-Reply-To: 
 <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comcast.net>
References: 
 <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <20020710231637.10844@triqs.com>

Jeremy Hylton wrote:
>Introductions: Please tell us about your interest in the persistence

I have a long-standing interest in persistence and scripting.  Finding a
middle ground between the relational data model, object storage,
structured storage, and plain serialization is a key area of focus for
me.  I'm self-employed, and have been so for well over a decade, with a
mix of working on commissioned projects and doing research on persistence
and scripting (more and more so).

What I would hope to see happen here, is a generalization away from being
purely OO (which has no intrinsic connection to persistence), purely
single-language (because data often lives *far* longer than language
technologies do), or even purely relational (which provides insufficient
expressiveness for algorithmic optimizations).  I think the main focus
needs to be on data representation, in such a way that language access
and memory-mapped files can effectively interface to each other. 
Ultimately, it may affect the very core, e.g. PyObject changes.

As designer of the MetaKit embedded database, which binds to several
languages, has many years of production use (maintaining full datafile
compatibility), and is finding its way into Roundup (Python), Starkits
(Tcl), and the AddressBook of every Mac (C++), I can't help but think
that there has to be something to an approach which focuses on
generality-through-simplicity.

So much for the blurb.  If this forum is about finding a genuine common
ground for persistence and scripting, not just ZODB and/or Python, then I
would love to help make things happen, and contribute serious time and
code (FWIW).

Jean-Claude Wippler <jcw@equi4.com>
Equi4 Software - http://www.equi4.com



From pje@telecommunity.com  Thu Jul 11 00:38:00 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 10 Jul 2002 19:38:00 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <200207102110.g6ALAEC27920@pcp02138704pcs.reston01.va.comca
 st.net>
References: <Your message of "Wed, 10 Jul 2002 16:47:21 EDT."
	<3D2C9D59.2000203@zope.com>
 <FCC3578B-9423-11D6-A637-00039303967A@umich.edu>
 <3.0.5.32.20020710162942.00868100@telecommunity.com>
 <3D2C9D59.2000203@zope.com>
Message-ID: <3.0.5.32.20020710193800.00893350@telecommunity.com>

At 05:10 PM 7/10/02 -0400, Guido van Rossum wrote:
>> Phillip J. Eby wrote:
>[...]
>> > Anyway, the fact that these things can be done based solely on the
existing
>> > ZODB4 Persistent and Transaction classes, entirely ignoring the "ZODB"
>> > package itself, means that what's available from ZODB is actually pretty
>> > close to what's needed as a base mechanism.  It's more a question (to me,
>> > anyway) of what could/should be improved, particularly in the form of how
>> > the API calls and interfaces are phrased.
>> > 
>> > For my requirements, I'd be fine with it if we just put Persistent and
>> > Transaction in the standard library, with better docs.  :)  But it'd be
>> > nice if certain things were spelled differently, or a bit more flexible.
>
>This is a goal I can agree with.  Care to start a list of what
>spellings you'd like to change?
>

As I said, I can pretty much live with it all as it is now.  Some minor
annoyances:

* There's no way to be notified that a "transaction is over".  You have to
trap different messages from Transaction, while perhaps registering a dummy
object, just to figure out transaction boundaries.  This is a pain when
creating transactional caches, i.e., ones which want to clear themselves
whenever a transaction commits *or* aborts.

* A similar, related pain, is that you have to re-register on *every*
transaction, and keep track of whether you've registered yet, any time you
do something that might mean you *should* be registered.  A way to
"permanently" (i.e. until app termination or otherwise requested) subscribe
to transaction begin/end messages would be very handy.  Or even to the
whole tpc_begin/vote/finish message sequence.

* While on the subject of such messages, why should the Transaction object
have to be the one to keep track of changed objects?  Why shouldn't data
managers do that themselves?  In the case of my "storage jars" model, I
have to track "currently dirty objects" separately from the transaction's
list of objects needing to be committed, because I may "pre-flush" certain
changes to say, an RDBMS, in order to ensure that queries within the same
transaction will see the updated data.  Since the "jar" has to track this
anyway, why does the Transaction need to do the same?  Why not just send
the jars a set of begin/vote/finish messages?

In my current framework, my "jars" automatically detect when they're being
asked to commit something that's already flushed to the back-end, and
ignore it.  If the Transaction didn't bother tracking stuff and telling me
to commit it, I'd just have tpc_begin cause a flush of all dirty objects,
and I'd be ready for tpc_vote.  Not only that, but the Transaction object
itself would get lots simpler, and wouldn't need to have complex logic to
manage data managers' objects for them!  (Granted, data managers would need
to know which items they "committed" during tpc_begin->tpc_vote, in order
to roll them back, but I suspect that many data managers are already
tracking this in some form, if only to do invalidation messages.)


* "Ghosting" attributes.  Right now, persistent objects are either loaded,
or not.  There's no way to designate an object as "loaded except for
attributes X, Y, and Z".  Why do I need that?  Because I may have data
stored for that object in different back-ends (LDAP and SQL is a combo that
comes up often for me) and don't want to incur a possibly large load-time
penalty to get all the (non-object) attributes, that may not even get read
during a particular transaction.  So, if we're talking about redoing
Persistence.Persistent, I'd like to see attribute-specific read/write
monitoring, if it doesn't add so much performance overhead as to remove the
benefits of having it.

By the way, it would be an acceptable solution for this if we had extremely
lightweight proxies that could stand-in for an arbitrary Python object, and
call something to load the "real" object upon access.  Of course, if we had
such an animal, it could replace the need for subclassing
Persistence.Persistent in the first place!  It could also trap all the
"modifying" methods like __setitem__, __setslice__, etc.

(Interestingly, the Zope 3 security proxy objects written in C, look to me
to have sufficient generality to perform these functions, in that they
monitor all attribute and method accesses.  Although I am perhaps missing
whether they work in regard to operations that the object performs upon
*itself*.  It may be that such accesses are not checked, but would need to
be for a persistence proxy.)

Anyway, the above pretty much sums up my principal annoyances/peeves with
Persistent and Transaction.  I can pretty much do everything I want with
the existing systems, but the above things would make them easier to do.
(Right now, to do state that's loaded from multiple back-ends, I have to
have some kind of support added into the object, or change its class on the
fly to add descriptors for lazily-loaded attributes.)



From pje@telecommunity.com  Thu Jul 11 01:38:49 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 10 Jul 2002 20:38:49 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <5.1.0.14.0.20020710144606.02aa5130@pop.videotron.ca>
References: <200207101831.g6AIVCw27387@pcp02138704pcs.reston01.va.comca
	st.net>
 <Your message of "Wed, 10 Jul 2002 12:31:31 EDT."
	<5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca>
 <Your message of "Wed, 10 Jul 2002 17:54:43 +0200."
	<20020710175443.A15365@sdrees2.de>
 <15658.57844.17239.668311@slothrop.zope.com>
 <20020710164054.B14438@sdrees2.de>
 <200207101456.g6AEufg26328@pcp02138704pcs.reston01.va.comcast.net>
 <20020710175443.A15365@sdrees2.de>
 <5.1.0.14.0.20020710122707.02a76d70@pop.videotron.ca>
Message-ID: <5.1.0.14.0.20020710203438.05ed8eb0@mail.telecommunity.com>

At 02:50 PM 7/10/02 -0400, Steve Menard wrote:

>Yep. I can't help but agree on this. I think its possible to come up with 
>a  common public interface for both mechanism. However, I doubt an object 
>built for one model can be reused as-is in a different model.

If by "object", you mean "persistence mechanism/mapping", then I agree.

But, if by "object" you mean the object to be persisted, I disagree.  An 
explicit goal of my recent work was to support transparent switching 
between persistence *mechanisms* without any change to the objects which 
were to be stored.  The only place I've been less than 100% successful 
using the ZODB4 P&T packages, is with objects stored in more than one 
back-end.  And even there, I can do it successfully as long as I don't need 
lazy-loading of "non-object" attributes.  (By which I mean values I'd like 
to have as simple strings or numbers, without subclassing them to create, 
say, a PersistentInteger or PersistentString class.)



From pje@telecommunity.com  Thu Jul 11 01:45:25 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 10 Jul 2002 20:45:25 -0400
Subject: [Persistence-sig] getting started
In-Reply-To: <20020710235059.80707.qmail@web20709.mail.yahoo.com>
References: <3.0.5.32.20020710193800.00893350@telecommunity.com>
Message-ID: <5.1.0.14.0.20020710202925.05ed0470@mail.telecommunity.com>

At 04:50 PM 7/10/02 -0700, Ilia Iourovitski wrote:
>Based upon your comments you need to completly
>different thing:
>1. Transaction monitor, to which you can register
>different providers. Probably heuristic commit too.
>In Java world it is JTA spec and Tyrex as example.
>In case if you mixing SQL, LDAP you xa transactions
>and
>two-way commit protocol.

Yes, and the existing ZODB4 Transaction package supports two-phase commit 
across multiple providers.  I just find its protocol for doing it annoying 
in some ways.


>2. For slow pesristence providers you need proxies,
>mapping meta info, lazy loading, lazy collection.
>Loading by id, by query.
>Usuall OR Mapper stuff.

Yes, and I've designed and/or written all that, except for the proxy or 
base class, for which I use ZODB4's Persistence package.


>And what about locks in case of RDBMS.

I don't think there's anything special needed at the persistence level to 
handle this.


>All of those thing are out of scope of persistence-sig.

Item 2 things, yes, item 1 things no.  And even item 2's stuff has to be 
*capable* of being *done* with the SIG's output.  I don't need anybody to 
write those things; I just want to base my work for them on a solid 
mechanism for detecting access and changes to objects, and a solid API for 
interacting with a transaction object.



From pobrien@orbtech.com  Thu Jul 11 14:06:59 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Thu, 11 Jul 2002 08:06:59 -0500
Subject: [Persistence-sig] 
 FW: OSCON Birds of a Feather Session - confirmation
Message-ID: <NBBBIOJPGKJEKIECEMCBGEKHNHAA.pobrien@orbtech.com>

It looks like O'Reilly is unable to accommodate our request to hold a BOF
during one of the Thursday morning sessions. (See the reply below.) So it
appears we have few options, none of which look particularly good, unless
someone can think of another:

1. Go with the scheduled time of 8-10pm Thursday. Guido will miss it. Jim
*might* be able to make it.

2. Change the time to 6-8pm to make it easier for Jim to attend. Guido will
still miss it.

3. Change the time to 6-7pm Tuesday. Guido can make it (I believe) but Jim
will miss it (unless he got an earlier flight?).

4. Give up on an "official" BOF and have lunch together or meet on our own
somehow, somewhere.

Let me know what you think.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------


-----Original Message-----
From: Gretchen Bartholomew [mailto:gretchen@oreilly.com]
Sent: Wednesday, July 10, 2002 11:18 PM
To: pobrien@orbtech.com
Cc: vee@oreilly.com
Subject: RE: OSCON Birds of a Feather Session - confirmation


Dear Patrick,


I would very much like to reschedule your BOF for a more convenient time for
you and your peers.  Unfortunately, however, I cannot schedule BOFs in the
morning or during the day, for that matter, while sessions are in progress.
All conference rooms are being utilized for convention sessions.


BOFs are held in the evenings.  I have several BOF slots available during
the following regular BOF dates/times.  You are welcome to move your BOF to
any slot within these timeframes.


Monday:         6:00pm - 10:00pm
Tuesday:                6:00pm - 7:00 pm
Wednesday:      8:00pm - 10:00pm
Thursday:       6:00pm - 10:00pm


Simply let me know which time out of those listed above would be the best
for you and I will reschedule.


Many thanks.
Gretchen



From jim@zope.com  Thu Jul 11 14:29:28 2002
From: jim@zope.com (Jim Fulton)
Date: Thu, 11 Jul 2002 09:29:28 -0400
Subject: [Persistence-sig]  FW: OSCON Birds of a Feather Session -
	confirmation
References: <NBBBIOJPGKJEKIECEMCBGEKHNHAA.pobrien@orbtech.com>
Message-ID: <3D2D8838.8040809@zope.com>

Patrick K. O'Brien wrote:
> It looks like O'Reilly is unable to accommodate our request to hold a BOF
> during one of the Thursday morning sessions. (See the reply below.) So it
> appears we have few options, none of which look particularly good, unless
> someone can think of another:
> 
> 1. Go with the scheduled time of 8-10pm Thursday. Guido will miss it. Jim
> *might* be able to make it.
> 
> 2. Change the time to 6-8pm to make it easier for Jim to attend. Guido will
> still miss it.

That woudn't make it any easier for me.


> 3. Change the time to 6-7pm Tuesday. Guido can make it (I believe) but Jim
> will miss it (unless he got an earlier flight?).

Uh, how about 10pm on Tuesday. I can make that unless my plane is
late. :)

> 4. Give up on an "official" BOF and have lunch together or meet on our own
> somehow, somewhere.

How about getting together early on Wednesday, say 7 am? We could meet over breakfast.

Jim



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From guido@python.org  Thu Jul 11 14:36:43 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 11 Jul 2002 09:36:43 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: Your message of "Thu, 11 Jul 2002 09:29:28 EDT."
             <3D2D8838.8040809@zope.com> 
References: <NBBBIOJPGKJEKIECEMCBGEKHNHAA.pobrien@orbtech.com>  
            <3D2D8838.8040809@zope.com> 
Message-ID: <200207111336.g6BDahf05430@odiug.zope.com>

> > 4. Give up on an "official" BOF and have lunch together or meet on our own
> > somehow, somewhere.
> 
> How about getting together early on Wednesday, say 7 am? We could meet over breakfast.

+1.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pobrien@orbtech.com  Thu Jul 11 15:36:07 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Thu, 11 Jul 2002 09:36:07 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session -
	confirmation
In-Reply-To: <200207111336.g6BDahf05430@odiug.zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBOEKKNHAA.pobrien@orbtech.com>

[Guido van Rossum]
>
> > > 4. Give up on an "official" BOF and have lunch together or
> meet on our own
> > > somehow, somewhere.
> >
> > How about getting together early on Wednesday, say 7 am? We
> could meet over breakfast.
>
> +1.

Okay, breakfast it is. Now we need to decide where. O'Reilly provides
breakfast and I'm trying to find out when they start serving. But I think
7:00 is probably a safe time. So we could just plan to meet at the O'Reilly
Food & Beverage Banquet Tent.

The other option is the hotel restaurant. Harbor's Edge Restaurant is
located off the main lobby of the Sheraton East Tower,  tantalizes your
palette with American Eclectic Cuisine and offers a panoramic view of the
Marina. Better still, they're open for breakfast starting at 6:30am.

Any preferences?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From pobrien@orbtech.com  Thu Jul 11 16:23:12 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Thu, 11 Jul 2002 10:23:12 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session
	-confirmation
In-Reply-To: <NBBBIOJPGKJEKIECEMCBOEKKNHAA.pobrien@orbtech.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBMEKMNHAA.pobrien@orbtech.com>

FYI, O'Reilly serves breakfast from 7 to 8:30 am.

> -----Original Message-----
> From: persistence-sig-bounces+pobrien=orbtech.com@python.org
> [mailto:persistence-sig-bounces+pobrien=orbtech.com@python.org]On Behalf
> Of Patrick K. O'Brien
> Sent: Thursday, July 11, 2002 9:36 AM
> To: Guido van Rossum; jim@zope.com
> Cc: Persistence-Sig
> Subject: RE: [Persistence-sig] FW: OSCON Birds of a Feather Session
> -confirmation
> 
> 
> [Guido van Rossum]
> >
> > > > 4. Give up on an "official" BOF and have lunch together or
> > meet on our own
> > > > somehow, somewhere.
> > >
> > > How about getting together early on Wednesday, say 7 am? We
> > could meet over breakfast.
> >
> > +1.
> 
> Okay, breakfast it is. Now we need to decide where. O'Reilly provides
> breakfast and I'm trying to find out when they start serving. But I think
> 7:00 is probably a safe time. So we could just plan to meet at 
> the O'Reilly
> Food & Beverage Banquet Tent.
> 
> The other option is the hotel restaurant. Harbor's Edge Restaurant is
> located off the main lobby of the Sheraton East Tower,  tantalizes your
> palette with American Eclectic Cuisine and offers a panoramic view of the
> Marina. Better still, they're open for breakfast starting at 6:30am.
> 
> Any preferences?
> 
> --
> Patrick K. O'Brien
> Orbtech
> -----------------------------------------------
> "Your source for Python software development."
> -----------------------------------------------
> Web:  http://www.orbtech.com/web/pobrien/
> Blog: http://www.orbtech.com/blog/pobrien/
> Wiki: http://www.orbtech.com/wiki/PatrickOBrien
> -----------------------------------------------
> 
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
> http://mail.python.org/mailman-21/listinfo/persistence-sig


From jim@zope.com  Thu Jul 11 17:03:49 2002
From: jim@zope.com (Jim Fulton)
Date: Thu, 11 Jul 2002 12:03:49 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session
	-confirmation
References: <NBBBIOJPGKJEKIECEMCBMEKMNHAA.pobrien@orbtech.com>
Message-ID: <3D2DAC65.90305@zope.com>

Patrick K. O'Brien wrote:
> FYI, O'Reilly serves breakfast from 7 to 8:30 am.

Let's just meet there right at 7 or a few minutes
before. They have reasonably big tables that would be
good for such a get together.

Jim

> 
>>-----Original Message-----
>>From: persistence-sig-bounces+pobrien=orbtech.com@python.org
>>[mailto:persistence-sig-bounces+pobrien=orbtech.com@python.org]On Behalf
>>Of Patrick K. O'Brien
>>Sent: Thursday, July 11, 2002 9:36 AM
>>To: Guido van Rossum; jim@zope.com
>>Cc: Persistence-Sig
>>Subject: RE: [Persistence-sig] FW: OSCON Birds of a Feather Session
>>-confirmation
>>
>>
>>[Guido van Rossum]
>>
>>>>>4. Give up on an "official" BOF and have lunch together or
>>>>>
>>>meet on our own
>>>
>>>>>somehow, somewhere.
>>>>>
>>>>How about getting together early on Wednesday, say 7 am? We
>>>>
>>>could meet over breakfast.
>>>
>>>+1.
>>>
>>Okay, breakfast it is. Now we need to decide where. O'Reilly provides
>>breakfast and I'm trying to find out when they start serving. But I think
>>7:00 is probably a safe time. So we could just plan to meet at 
>>the O'Reilly
>>Food & Beverage Banquet Tent.
>>
>>The other option is the hotel restaurant. Harbor's Edge Restaurant is
>>located off the main lobby of the Sheraton East Tower,  tantalizes your
>>palette with American Eclectic Cuisine and offers a panoramic view of the
>>Marina. Better still, they're open for breakfast starting at 6:30am.
>>
>>Any preferences?
>>
>>--
>>Patrick K. O'Brien
>>Orbtech
>>-----------------------------------------------
>>"Your source for Python software development."
>>-----------------------------------------------
>>Web:  http://www.orbtech.com/web/pobrien/
>>Blog: http://www.orbtech.com/blog/pobrien/
>>Wiki: http://www.orbtech.com/wiki/PatrickOBrien
>>-----------------------------------------------
>>
>>
>>
>>_______________________________________________
>>Persistence-sig mailing list
>>Persistence-sig@python.org
>>http://mail.python.org/mailman-21/listinfo/persistence-sig
>>



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From guido@python.org  Thu Jul 11 17:33:08 2002
From: guido@python.org (Guido van Rossum)
Date: Thu, 11 Jul 2002 12:33:08 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session
	-confirmation
In-Reply-To: Your message of "Thu, 11 Jul 2002 12:03:49 EDT."
             <3D2DAC65.90305@zope.com> 
References: <NBBBIOJPGKJEKIECEMCBMEKMNHAA.pobrien@orbtech.com>  
            <3D2DAC65.90305@zope.com> 
Message-ID: <200207111633.g6BGX8F13139@odiug.zope.com>

> Let's just meet there right at 7 or a few minutes
> before. They have reasonably big tables that would be
> good for such a get together.

OK.  Wednesday, 7am at the O'Reilly breakfast table.

So far those present will be Jim, Patrick and me.  Who else plans to
be there?  Where else could we announce this?

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pobrien@orbtech.com  Thu Jul 11 17:46:54 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Thu, 11 Jul 2002 11:46:54 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session
	-confirmation
In-Reply-To: <200207111633.g6BGX8F13139@odiug.zope.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBMELANHAA.pobrien@orbtech.com>

[Guido van Rossum]
>
> OK.  Wednesday, 7am at the O'Reilly breakfast table.
>
> So far those present will be Jim, Patrick and me.  Who else plans to
> be there?  Where else could we announce this?

I'm going to go ahead and hold the BOF on Thursday night as well for those
of us who can't get enough of this Persistence topic. I'll report the
results of the breakfast meeting at the BOF, and I'll take notes at the BOF
and report them back here. I'll also see if I can get O'Reilly to change the
BOF description to mention that we'll be having an informal pre-BOF meeting
over breakfast Wednesday morning.

Should someone list this information on the SIG web page on python.org as
well?

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From pobrien@orbtech.com  Thu Jul 11 20:38:59 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Thu, 11 Jul 2002 14:38:59 -0500
Subject: [Persistence-sig] FW: OSCON Birds of a Feather
	Session-confirmation
In-Reply-To: <NBBBIOJPGKJEKIECEMCBMELANHAA.pobrien@orbtech.com>
Message-ID: <NBBBIOJPGKJEKIECEMCBAELGNHAA.pobrien@orbtech.com>

The BOF info at O'Reilly
(http://conferences.oreillynet.com/pub/w/15/bof.html) now looks like this:

Python Persistence
Date: 07/25/2002
Time: 8:00pm - 10:00pm
Location: Grande Ballroom C in the East Tower
Moderated by: Patrick O'Brien, Orbtech

A Python Persistence Special Interest Group was recently formed to explore
ways to add basic persistence and transaction mechanisms into the core of
Python to avoid duplication of effort by a variety of projects that have
similar issues. This BOF will permit participants to ponder Python
persistence in person. In addition, anyone interested in an informal Python
Persistence breakfast discussion with Jim Fulton and Guido van Rossum is
welcome to join us at the O'Reilly Food Tent Wednesday morning at 7am.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python software development."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------



From jim@zope.com  Fri Jul 12 17:44:45 2002
From: jim@zope.com (Jim Fulton)
Date: Fri, 12 Jul 2002 12:44:45 -0400
Subject: [Persistence-sig] FW: OSCON Birds of a Feather Session
	-confirmation
References: <NBBBIOJPGKJEKIECEMCBMELANHAA.pobrien@orbtech.com>
Message-ID: <3D2F077D.3000201@zope.com>

Patrick K. O'Brien wrote:
> [Guido van Rossum]
> 
>>OK.  Wednesday, 7am at the O'Reilly breakfast table.
>>
>>So far those present will be Jim, Patrick and me.  Who else plans to
>>be there?  Where else could we announce this?
>>
> 
> I'm going to go ahead and hold the BOF on Thursday night as well for those
> of us who can't get enough of this Persistence topic. I'll report the
> results of the breakfast meeting at the BOF, and I'll take notes at the BOF
> and report them back here. I'll also see if I can get O'Reilly to change the
> BOF description to mention that we'll be having an informal pre-BOF meeting
> over breakfast Wednesday morning.

I went ahead and extended my stay a day, so I'll be able to make the BoF.

Jim



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From pje@telecommunity.com  Sun Jul 14 17:21:52 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Sun, 14 Jul 2002 12:21:52 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
Message-ID: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>

Since it's been pretty quiet here, apart from the BOF discussion, I thought 
I'd draft up a transaction/participant API to stir up some debate.  I did a 
little research on JTA and related protocols in Java, and found that JTA is 
actually pretty pitiful in comparison to the rich model already offered by 
ZODB.  Also, the DBAPI doesn't really offer a way to get at multi-phase 
commit protocols, but perhaps if we get a nice Python transaction API 
together, we can encourage such access be made available in DBAPI 3.0.

My goals for the straw man were to support the functionality of ZODB 
transactions, but without any ZODB-specific baggage in the API, to decouple 
the management of dirty objects, writes, etc. from the co-ordination of the 
transaction itself, and to support a richer model of what a "transaction 
participant" is, including the ability to nest or chain storage mechanisms 
together to an arbitrary depth.  Backward compatibility in the API or the 
transaction coordination messages was explicitly not a goal.

Anyway, here it is, for all of you to pick apart or set fire to, like the 
straw man it is.  I ask only that you read the whole thing before you light 
up your flamethrowers.  :)


"""'Straw Man' Transaction Interfaces"""

class Transaction:

     """Manages transaction lifecycle, participants, and metadata.

     There is no predefined number of transactions that may exist, or
     what they are associated with.  Depending on the application
     model, there may be one per application, one per transaction, one
     per incoming connection (in server applications), or some other
     number.  The transaction package should, however, offer an API for
     managing per-thread (or per-app, if threads aren't being used)
     transactions, since this will probably be the most common usage
     scenario."""

     # The basic transaction lifecycle

     def begin(self, **info):
         """Begin a transaction.  Raise TransactionInProgress if
         already begun.  Any keyword arguments are passed on to the
         setInfo() method.  (See below.)"""

     def commit(self):
         """Commit the transaction, or raise NoTransaction if not in
         progress."""

     def abort(self):
         """Abort the transaction, or raise NoTransaction if not in
         progress."""


     # Managing participants

     def subscribe(self, participant):
         """Add 'participant' to the set of objects that will receive
         transaction messages.  Note that no particular ordering of
         participants should be assumed.  If the transaction is already
         active, 'participant' will receive a 'begin_txn()' message. If
         a commit or savepoint is in progress, 'participant' may also
         receive other messages to "catch it up" to the other
         participants.  However, if the commit or savepoint has already
         progressed too far for the new participant to join in, a
         TransactionInProgress error will be raised.

         Note: this is not ZODB!  Participants remain subscribed until
         they unsubscribe, or until the transaction object is
         de-allocated!"""

     def unsubscribe(self, participant):
         """Remove 'participant' from the set of objects that will
         receive transaction messages.  It can only be called when a
         transaction is not in progress, or in response to
         begin/commit/abort_txn() messages received by the
         unsubscribing participant.  Otherwise, TransactionInProgress
         will be raised."""


     # Getting/setting information about a transaction

     def isActive(self):
         """Return True if transaction is in progress."""

     def getTimestamp(self):
         """Return the time that the transaction began, in time.time()
         format, or None if no transaction in progress."""

     def setInfo(self, **args):
         """Update the transaction's metadata dictionary with the
         supplied keyword arguments.  This can be used to record
         information such as a description of the transaction, the user
         who performed it, etc. Note that the transaction itself does
         nothing with this information. Transaction participants will
         need to retrieve the information with 'getInfo()' and record
         it at the appropriate point during the transaction."""

     def getInfo(self):
         """Return a copy of the transaction's metadata dictionary"""


     # "Sub-transaction" support

     def savepoint(self):
         """Request a write to stable storage, and mark a savepoint for
         possible partial rollback via 'revert()'.  This will most
         often be used simply to suggest a good time for in-memory data
         to be written out.  But it can also be used in conjunction
         with revert() to provide a single-level 'nested transaction',
         if all participants support reverting to a savepoint."""

     def revert(self):
         """Request a rollback to the last savepoint.  If no savepoint
         has occurred in this transaction, this is implemented via an
         abort(), followed by a begin(), keeping the same metadata.  If
         a savepoint has occurred, this will raise
         CannotRevertException unless all transaction participants
         support reverting to a savepoint."""



class Participant:
     """Participant in a transaction; may be a resource manager, a
     transactional cache, or just a logging/monitoring object.

     Event sequence is approximately as follows:

         begin_txn
         ( ( begin_savepoint end_savepoint ) | revert ) *
         ( begin_commit vote_commit commit_txn ) | abort_txn

     In other words, every transaction begins with begin_txn, and ends
     with either commit_txn or abort_txn.  A commit_txn will always be
     preceded by begin_commit and vote_commit.  An abort_txn may occur
     at *any* point following begin_txn, and aborts the transaction.
     begin/end_savepoint pairs and revert() messages may occur any time
     between begin_txn and begin_commit, as long as abort_txn hasn't
     happened.

     Generally speaking, participants fall into a few broad categories:

     * Database connections

     * Resource managers that write data to another participant, e.g. a
       storage manager writing to a database connection

     * Resource managers that manage their own storage transactions,
       e.g. ZODB Database/Storage objects, a filesystem-based queue, etc.

     * Objects which don't manage any transactional resources, but need to
       know what's happening with a transaction, in order to log it.

     Each kind of participant will typically use different messages to
     achieve their goals.  Resource managers that use other
     participants for storage, for example, won't care much about
     begin_txn() and vote_commit(), while a resource manager that
     manages direct storage will care about vote_commit() very deeply!

     Resource managers that use other participants for storage, but
     buffer writes to the other participant, will need to pay close
     attention to the begin_savepoint() and begin_commit() messages.
     Specifically, they must flush all pending writes to the
     participant that handles their storage, and enter a
     "write-through" mode, where any further writes are flushed
     immediately to the underlying participant.  This is to ensure that
     all writes are written to the "root participant" for those writes,
     by the time end_savepoint() or vote_commit() is issued.

     By following this algorithm, any number of participants may be
     chained together, such as a persistence manager that writes to an
     XML document, which is persisted in a database table, which is
     persisted in a disk file.  The persistence manager, the XML
     document, the database table, and the disk file would all be
     participants, but only the disk file would actually use
     vote_commit() and commit_txn() to handle a commit.  All of the
     other participants would flush pending updates and enter
     write-through mode at the begin_commit() message, guaranteeing that
     the disk file participant would know about all the updates by the
     time vote_comit() was issued, regardless of the order in which the
     participants received the messages."""

     def begin_txn(self, txn):
         """Transaction is beginning; nothing special to be done in
         most cases. A transactional cache might use this message to
         reset itself.  A database connection might issue BEGIN TRAN
         here."""

     def begin_savepoint(self, txn):
         """Savepoint is beginning; flush dirty objects and enter
         write-through mode, if applicable.  Note: this is not ZODB!
         You will not get savepoint messages before a regular commit,
         just because another savepoint has already occurred!"""

     def end_savepoint(self, txn):
         """Savepoint is finished, it's safe to return to buffering
         writes; a database connection would probably issue a
         savepoint/checkpoint command here."""

     def revert(self, txn):
         """Roll back to last savepoint, or raise
         CannotRevertException; Database connections whose underlying
         DB doesn't support savepoints should definitely raise
         CannotRevertError.  Resource managers that write data to other
         participants, should simply roll back state for all objects
         changed since the last savepoint, whether written through to
         the underlying storage or not.  Transactional caches may want
         to reset on this message, also, depending on their precise
         semantics. Note: this is not ZODB!  You will not get a
         revert() before an abort_txn(), just because a savepoint has
         occurred during the transaction!"""

     def begin_commit(self, txn):
         """Transaction commit is beginning; flush dirty objects and
         enter write-through mode, if applicable.  DB connections will
         probably do nothing here.  Note: participants *must* continue
         to accept writes until vote_commit() occurs, and *must*
         accept repeated writes of the same objects!"""

     def vote_commit(self, txn):
         """Raise an exception if commit isn't possible.  This will
         mostly be used by resource managers that handle their own
         storage, or the few DB connections that are capable of
         multi-phase commit."""

     def commit_txn(self, txn):
         """This message follows vote_commit, if no participants vetoed
         the commit.  DB connections will probably issue COMMIT TRAN
         here. Transactional caches might use this message to reset
         themselves."""

     def abort_txn(self, txn):
         """This message can be received at any time, and means the
         entire transaction must be rolled back.  Transactional caches
         might use this message to reset themselves."""




From Sebastien.Bigaret@inqual.com  Mon Jul 15 14:06:16 2002
From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret)
Date: 15 Jul 2002 15:06:16 +0200
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: "Phillip J. Eby"'s message of "Sun, 14 Jul 2002 12:21:52 -0400"
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>


Ok, now for some comments (with no flamethrowers lighted up, but maybe I'll
trigger some ;)


About the Transaction API:

  The API seems globally OK to me I have. I'd like to make the following
  remarks:

  - about info/setInfo(): maybe we need a setInfo() different from an
    updateInfo() or addToInfo(). I also suspect that a 'ResourceManager'
    writing info. to other participants might use such a metadictionary to
    pass additional information for use in the current transaction (warning:
    name collision); if this is *not* the place for that, it should perhaps be
    stated in doc.

  - registration of Participants:

      We might need a unique identifier for a given participant ; e.g., we
      might wish that only one participant for a given 'postgresql' DB
      connection is registered (in that case, the id. could be the DB backend
      name+the connectionDictionary).

      Obviously participants could still register without an id.

  - revert(): I expected an 'undo()' ; 'revert' sounds like 'abort' to me, but
    this can just be a language problem --the documentation made it clear.

  - about commit(): I see this basically like a vote_commit() on each
    participants, followed by a commit_txn()

    I have the feeling that what will be done during the commit() phase should
    be explicitly stated, along with the goals we are going after. Here is a
    little example: suppose a transaction has to commit changes against two
    different DB storages, DB1 which supports multi-phase commit, DB2 which
    does not.

    Then they get vote_commit(): DB1 will be able to answer OK or KO, but DB2
    will not because it is not capable of saying whether a transaction will
    successfully succeed, hence: it answers 'OK' to the 'vote_commit' message.

    Now the participants gets the commit_txn() ; since we do not assume any
    particular ordering for paricipants, suppose that DB1 gets it first. DB1
    commits the changes, then DB2 attempts to commit its changes but fails:
    what can we do? We can stop committing and start sending 'abort_txn' to
    all participants, however, DB1 is likely to be unable to revert the
    already committed changes --and this will definitely be the case if both
    DB1 and DB2 do not support nested transactions).
    
    My opinion here is that we shouldn't try to handle multi-backends commits
    as a whole -- some backends simply makes it almost impossible. But: this
    should be clearly stated.
    
  - last on this: it may be useful for observers to get events such as
    transaction_did_commit() (committing is a Transaction's message for which
    we cannot guarantee it will come to its normal end, for the reasons
    written above) ; I'm thinking here of some DB-caches that would be
    participants/observers for the Transaction machinery, that would take the
    opportunity to update their caches, etc.

About the Participant API:

  - I have some problems about the begin/end_savepoint(): again this might be
    a language problem, but I would prefer something like
    'prepareToSavepoint()' and 'markSavepoint()'

  - same for begin_commit()

  - vote_for_commit: as far as I understand participants using other
    participants can simply ignore it, but should not raise (exception to be
    named, BTW). To my understanding, a raise here is understood as a veto.
    Is that it?

Last: do we need to specify a TransactionManager or TransactionFactory API?

Some ideas about what could be done there: (hmm, this could be made class
method as well)

  - registering participants' factories, so that Transactions can be
    initialized with a default set of participants, since applications often
    use the same configuration for their Transactions. Something like:

      def buildDefaultTransaction(self)

  - ???



  It seems to me that the points stressed in the sig-charter are taken into
account here --except for the 'Effective Memory Usage' which, by the way,
cannot be addressed at the transaction level --and I do not really see how
this particular point can be made anything else but a ``compulsory
recommendation'' ?!

-- Sebastien.



From jim@zope.com  Mon Jul 15 14:59:50 2002
From: jim@zope.com (Jim Fulton)
Date: Mon, 15 Jul 2002 09:59:50 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <3D32D556.6040801@zope.com>


This is an interesting proposal. I'll me interested to see
more discussion on it. It appears to shift responsability for
management of individual object changes further into the resource
managers, which is fine.

I'm a little fuzzy on participants that write data to other participants.
The notion that they flush data on begin_savepoints feels a little
brittle to me. If the participant the flush to does any significant work
on begin_savepoint, then it appears that things could happen in an inconvenient
order and cause problems.

Is the transaction info cleared at transaction boundaries?

Jim

Phillip J. Eby wrote:
> Since it's been pretty quiet here, apart from the BOF discussion, I 
> thought I'd draft up a transaction/participant API to stir up some 
> debate.  I did a little research on JTA and related protocols in Java, 
> and found that JTA is actually pretty pitiful in comparison to the rich 
> model already offered by ZODB.  Also, the DBAPI doesn't really offer a 
> way to get at multi-phase commit protocols, but perhaps if we get a nice 
> Python transaction API together, we can encourage such access be made 
> available in DBAPI 3.0.
> 
> My goals for the straw man were to support the functionality of ZODB 
> transactions, but without any ZODB-specific baggage in the API, to 
> decouple the management of dirty objects, writes, etc. from the 
> co-ordination of the transaction itself, and to support a richer model 
> of what a "transaction participant" is, including the ability to nest or 
> chain storage mechanisms together to an arbitrary depth.  Backward 
> compatibility in the API or the transaction coordination messages was 
> explicitly not a goal.
> 
> Anyway, here it is, for all of you to pick apart or set fire to, like 
> the straw man it is.  I ask only that you read the whole thing before 
> you light up your flamethrowers.  :)
> 
> 
> """'Straw Man' Transaction Interfaces"""
> 
> class Transaction:
> 
>     """Manages transaction lifecycle, participants, and metadata.
> 
>     There is no predefined number of transactions that may exist, or
>     what they are associated with.  Depending on the application
>     model, there may be one per application, one per transaction, one
>     per incoming connection (in server applications), or some other
>     number.  The transaction package should, however, offer an API for
>     managing per-thread (or per-app, if threads aren't being used)
>     transactions, since this will probably be the most common usage
>     scenario."""
> 
>     # The basic transaction lifecycle
> 
>     def begin(self, **info):
>         """Begin a transaction.  Raise TransactionInProgress if
>         already begun.  Any keyword arguments are passed on to the
>         setInfo() method.  (See below.)"""
> 
>     def commit(self):
>         """Commit the transaction, or raise NoTransaction if not in
>         progress."""
> 
>     def abort(self):
>         """Abort the transaction, or raise NoTransaction if not in
>         progress."""
> 
> 
>     # Managing participants
> 
>     def subscribe(self, participant):
>         """Add 'participant' to the set of objects that will receive
>         transaction messages.  Note that no particular ordering of
>         participants should be assumed.  If the transaction is already
>         active, 'participant' will receive a 'begin_txn()' message. If
>         a commit or savepoint is in progress, 'participant' may also
>         receive other messages to "catch it up" to the other
>         participants.  However, if the commit or savepoint has already
>         progressed too far for the new participant to join in, a
>         TransactionInProgress error will be raised.
> 
>         Note: this is not ZODB!  Participants remain subscribed until
>         they unsubscribe, or until the transaction object is
>         de-allocated!"""
> 
>     def unsubscribe(self, participant):
>         """Remove 'participant' from the set of objects that will
>         receive transaction messages.  It can only be called when a
>         transaction is not in progress, or in response to
>         begin/commit/abort_txn() messages received by the
>         unsubscribing participant.  Otherwise, TransactionInProgress
>         will be raised."""
> 
> 
>     # Getting/setting information about a transaction
> 
>     def isActive(self):
>         """Return True if transaction is in progress."""
> 
>     def getTimestamp(self):
>         """Return the time that the transaction began, in time.time()
>         format, or None if no transaction in progress."""
> 
>     def setInfo(self, **args):
>         """Update the transaction's metadata dictionary with the
>         supplied keyword arguments.  This can be used to record
>         information such as a description of the transaction, the user
>         who performed it, etc. Note that the transaction itself does
>         nothing with this information. Transaction participants will
>         need to retrieve the information with 'getInfo()' and record
>         it at the appropriate point during the transaction."""
> 
>     def getInfo(self):
>         """Return a copy of the transaction's metadata dictionary"""
> 
> 
>     # "Sub-transaction" support
> 
>     def savepoint(self):
>         """Request a write to stable storage, and mark a savepoint for
>         possible partial rollback via 'revert()'.  This will most
>         often be used simply to suggest a good time for in-memory data
>         to be written out.  But it can also be used in conjunction
>         with revert() to provide a single-level 'nested transaction',
>         if all participants support reverting to a savepoint."""
> 
>     def revert(self):
>         """Request a rollback to the last savepoint.  If no savepoint
>         has occurred in this transaction, this is implemented via an
>         abort(), followed by a begin(), keeping the same metadata.  If
>         a savepoint has occurred, this will raise
>         CannotRevertException unless all transaction participants
>         support reverting to a savepoint."""
> 
> 
> 
> class Participant:
>     """Participant in a transaction; may be a resource manager, a
>     transactional cache, or just a logging/monitoring object.
> 
>     Event sequence is approximately as follows:
> 
>         begin_txn
>         ( ( begin_savepoint end_savepoint ) | revert ) *
>         ( begin_commit vote_commit commit_txn ) | abort_txn
> 
>     In other words, every transaction begins with begin_txn, and ends
>     with either commit_txn or abort_txn.  A commit_txn will always be
>     preceded by begin_commit and vote_commit.  An abort_txn may occur
>     at *any* point following begin_txn, and aborts the transaction.
>     begin/end_savepoint pairs and revert() messages may occur any time
>     between begin_txn and begin_commit, as long as abort_txn hasn't
>     happened.
> 
>     Generally speaking, participants fall into a few broad categories:
> 
>     * Database connections
> 
>     * Resource managers that write data to another participant, e.g. a
>       storage manager writing to a database connection
> 
>     * Resource managers that manage their own storage transactions,
>       e.g. ZODB Database/Storage objects, a filesystem-based queue, etc.
> 
>     * Objects which don't manage any transactional resources, but need to
>       know what's happening with a transaction, in order to log it.
> 
>     Each kind of participant will typically use different messages to
>     achieve their goals.  Resource managers that use other
>     participants for storage, for example, won't care much about
>     begin_txn() and vote_commit(), while a resource manager that
>     manages direct storage will care about vote_commit() very deeply!
> 
>     Resource managers that use other participants for storage, but
>     buffer writes to the other participant, will need to pay close
>     attention to the begin_savepoint() and begin_commit() messages.
>     Specifically, they must flush all pending writes to the
>     participant that handles their storage, and enter a
>     "write-through" mode, where any further writes are flushed
>     immediately to the underlying participant.  This is to ensure that
>     all writes are written to the "root participant" for those writes,
>     by the time end_savepoint() or vote_commit() is issued.
> 
>     By following this algorithm, any number of participants may be
>     chained together, such as a persistence manager that writes to an
>     XML document, which is persisted in a database table, which is
>     persisted in a disk file.  The persistence manager, the XML
>     document, the database table, and the disk file would all be
>     participants, but only the disk file would actually use
>     vote_commit() and commit_txn() to handle a commit.  All of the
>     other participants would flush pending updates and enter
>     write-through mode at the begin_commit() message, guaranteeing that
>     the disk file participant would know about all the updates by the
>     time vote_comit() was issued, regardless of the order in which the
>     participants received the messages."""
> 
>     def begin_txn(self, txn):
>         """Transaction is beginning; nothing special to be done in
>         most cases. A transactional cache might use this message to
>         reset itself.  A database connection might issue BEGIN TRAN
>         here."""
> 
>     def begin_savepoint(self, txn):
>         """Savepoint is beginning; flush dirty objects and enter
>         write-through mode, if applicable.  Note: this is not ZODB!
>         You will not get savepoint messages before a regular commit,
>         just because another savepoint has already occurred!"""
> 
>     def end_savepoint(self, txn):
>         """Savepoint is finished, it's safe to return to buffering
>         writes; a database connection would probably issue a
>         savepoint/checkpoint command here."""
> 
>     def revert(self, txn):
>         """Roll back to last savepoint, or raise
>         CannotRevertException; Database connections whose underlying
>         DB doesn't support savepoints should definitely raise
>         CannotRevertError.  Resource managers that write data to other
>         participants, should simply roll back state for all objects
>         changed since the last savepoint, whether written through to
>         the underlying storage or not.  Transactional caches may want
>         to reset on this message, also, depending on their precise
>         semantics. Note: this is not ZODB!  You will not get a
>         revert() before an abort_txn(), just because a savepoint has
>         occurred during the transaction!"""
> 
>     def begin_commit(self, txn):
>         """Transaction commit is beginning; flush dirty objects and
>         enter write-through mode, if applicable.  DB connections will
>         probably do nothing here.  Note: participants *must* continue
>         to accept writes until vote_commit() occurs, and *must*
>         accept repeated writes of the same objects!"""
> 
>     def vote_commit(self, txn):
>         """Raise an exception if commit isn't possible.  This will
>         mostly be used by resource managers that handle their own
>         storage, or the few DB connections that are capable of
>         multi-phase commit."""
> 
>     def commit_txn(self, txn):
>         """This message follows vote_commit, if no participants vetoed
>         the commit.  DB connections will probably issue COMMIT TRAN
>         here. Transactional caches might use this message to reset
>         themselves."""
> 
>     def abort_txn(self, txn):
>         """This message can be received at any time, and means the
>         entire transaction must be rolled back.  Transactional caches
>         might use this message to reset themselves."""
> 
> 
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
> http://mail.python.org/mailman-21/listinfo/persistence-sig



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From Sebastien.Bigaret@inqual.com  Mon Jul 15 15:18:47 2002
From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret)
Date: 15 Jul 2002 16:18:47 +0200
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200"
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
Message-ID: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>


>     My opinion here is that we shouldn't try to handle multi-backends commits
>     as a whole -- some backends simply makes it almost impossible. But: this
>     should be clearly stated.

More on this: what do you think of the following possible additions to the API?
(This is just a quick draft)

    NB: TP Monitoring stands for Transaction Processing Monitoring

- to class Participant:

    canBeTPMonitored(self): # or supportNestedTransactions() ?
      """
      Tells whether the participant can be integrated into a TP monitoring
      process. Valid answers are:

        - YES ; e.g. RDBMS that support nested transactions will definitely
          answer yes.
  
        - NO
  
        - NOT_APPLICABLE ; e.g. for listeners

- to class Transaction:

    def isTPMonitoringEnabled(self):
      """
      Answer is true if all participants answer 'YES' or 'NOT_APPLICABLE' to
      'canBeTPMonitored', false otherwise.

      """

--> With such an API, we could then make sure that, when
    isTPMonitoringEnabled() evaluates to true, the commit() phase in a
    Transaction ensures that it does all the changes or rollbacks everything.
    (e.g. by beginning a top-level transaction in each participants, via the
    appropriate API --to be defined--, and by committing this top-level
    transaction at the end of the commit phase, when everything has gone
    smoothly).


Last note: I'm not positive at all about canBeTPMonitored being equivalent to
  the ability of using/simulating nested transactions ; I have the feeling
  that at least the latter implies the former. 
    For RDBMS it seems OK, for file-based storage this could be emulated
  through concurrent versioning, but the general case is quite a bit beyond
  my knowledges. And this is not something I already played with but just a
  dream of mine, so people having real experience with TP monitoring processes
  can go and grab their flamethrowers now!

  
-- Sebastien.



From pje@telecommunity.com  Mon Jul 15 16:36:26 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 15 Jul 2002 11:36:26 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <3D32D556.6040801@zope.com>
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <3.0.5.32.20020715113626.01abf990@telecommunity.com>

At 09:59 AM 7/15/02 -0400, Jim Fulton wrote:
>
>This is an interesting proposal. I'll me interested to see
>more discussion on it. It appears to shift responsability for
>management of individual object changes further into the resource
>managers, which is fine.

My thought on this was that the resource managers know more about their
objects than the transaction does.  Also, this should greatly reduce the
complexity of the commit operation compared to ZODB's Transaction.  And
most important, it *decouples transactions from the persistence framework*.
 ZODB's Transaction has to know how to get at an object's storage manager,
while Strawman doesn't.


>I'm a little fuzzy on participants that write data to other participants.
>The notion that they flush data on begin_savepoints feels a little
>brittle to me. If the participant the flush to does any significant work
>on begin_savepoint, then it appears that things could happen in an
inconvenient
>order and cause problems.

The assumption here is that things which do "real" work (as opposed to
writing to another participant) should trap the second message.  In other
words, there's a pretty solid distinction between "delegating" participants
and "direct" participants, in terms of their behavior.  The use cases I'm
looking at are one or more persistence mechanisms writing to a storage
mechanism.  At some point, you have to "bottom out" to "real" storage, and
that's where you handle the ending of a savepoint or commit.


>Is the transaction info cleared at transaction boundaries?

If you mean the setInfo() stuff, yes.  I probably should've documented
that, but then if I documented *every* assumption I made, the doc would've
been twice the size.  :)


From pje@telecommunity.com  Mon Jul 15 16:57:56 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 15 Jul 2002 11:57:56 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
References: <"Phillip J. Eby"'s message of "Sun, 14 Jul 2002 12:21:52 -0400">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <3.0.5.32.20020715115756.01ab65f0@telecommunity.com>

At 03:06 PM 7/15/02 +0200, Sebastien Bigaret wrote:
>
>  - about info/setInfo(): maybe we need a setInfo() different from an
>    updateInfo() or addToInfo(). I also suspect that a 'ResourceManager'
>    writing info. to other participants might use such a metadictionary to
>    pass additional information for use in the current transaction (warning:
>    name collision); if this is *not* the place for that, it should
perhaps be
>    stated in doc.

I probably should've called it updateInfo(), since a dict.update() was the
semantics I had in mind.  I was deliberately leaving it vague as to what
information might be passed in it, since it was primarily a mechanism to
allow for extensions, not to mention support of Zope's need to save "user"
and "note" metadata on transactions.


>  - registration of Participants:
>
>      We might need a unique identifier for a given participant ; e.g., we
>      might wish that only one participant for a given 'postgresql' DB
>      connection is registered (in that case, the id. could be the DB backend
>      name+the connectionDictionary).
>
>      Obviously participants could still register without an id.

I think an identifier is a YAGNI.  I'm almost positive that my application
model won't need it.  But if you just want register() to guarantee that the
participant is registered once and only once, that's fine by me and a
sensible thing, IMHO.  Although I might just as soon it raise
ParticipantAlreadyRegistered if you register it again, as that might help
expose a bug in your code.  :)  Of course, if it does that, then I suppose
exposing an isRegistered(participant) method would allow you to work around
that.



>  - revert(): I expected an 'undo()' ; 'revert' sounds like 'abort' to me,
but
>    this can just be a language problem --the documentation made it clear.

I tried to use common RDBMS terminology; the few examples of "checkpoint"
or "savepoint" I found (e.g. Sybase) used "revert" as the terminology for
going back to the last checkpoint or savepoint.


>  - about commit(): I see this basically like a vote_commit() on each
>    participants, followed by a commit_txn()

Actually it's begin_commit() on each, vote_commit() on each, and then
commit_txn() on each.


>    I have the feeling that what will be done during the commit() phase
should
>    be explicitly stated, along with the goals we are going after. Here is a
>    little example: suppose a transaction has to commit changes against two
>    different DB storages, DB1 which supports multi-phase commit, DB2 which
>    does not.
>
>    Then they get vote_commit(): DB1 will be able to answer OK or KO, but DB2
>    will not because it is not capable of saying whether a transaction will
>    successfully succeed, hence: it answers 'OK' to the 'vote_commit'
message.
>
>    Now the participants gets the commit_txn() ; since we do not assume any
>    particular ordering for paricipants, suppose that DB1 gets it first. DB1
>    commits the changes, then DB2 attempts to commit its changes but fails:
>    what can we do? We can stop committing and start sending 'abort_txn' to
>    all participants, however, DB1 is likely to be unable to revert the
>    already committed changes --and this will definitely be the case if both
>    DB1 and DB2 do not support nested transactions).
>    
>    My opinion here is that we shouldn't try to handle multi-backends commits
>    as a whole -- some backends simply makes it almost impossible. But: this
>    should be clearly stated.

Actually, I think we should just document what will happen if you mix
voting and non-voting participants.  Also, we may wish to have some way to
declare a participant non-voting, so that such participants can receive
commit_txn() first.  ZODB Transactions can survive the failure of *one*
commit_txn() message, and StrawMan can too.  The most common use case for a
non-voting paricipant would be an RDBMS connection, and the most common use
case of such is to have only one, even if there will be other participants
writing to it.

ZODB declares itself "hosed" when a failure occurs past the first
tpc_finish() (its equivalent to commit_txn).  We will need to be similarly
cautious, if there is more than one non-voting participant.


>  - last on this: it may be useful for observers to get events such as
>    transaction_did_commit() (committing is a Transaction's message for which
>    we cannot guarantee it will come to its normal end, for the reasons
>    written above) ; I'm thinking here of some DB-caches that would be
>    participants/observers for the Transaction machinery, that would take the
>    opportunity to update their caches, etc.

That's a good point, perhaps adding a 'commit_finished()' message might do
the trick, although there are already quite a lot of messages.



>  - I have some problems about the begin/end_savepoint(): again this might be
>    a language problem, but I would prefer something like
>    'prepareToSavepoint()' and 'markSavepoint()'

Those aren't bad.


>  - same for begin_commit()

I could see prepareToCommit or prepare_for_commit, certainly.


>  - vote_for_commit: as far as I understand participants using other
>    participants can simply ignore it, but should not raise (exception to be
>    named, BTW). To my understanding, a raise here is understood as a veto.
>    Is that it?

'vote_on_commit' seems more natural to me, phrasing-wise.  Yes, a raise is
a veto; that's an assumption from ZODB transactions that I failed to document.


>Last: do we need to specify a TransactionManager or TransactionFactory API?

I don't think so, really, other than what I mentioned about providing some
simple thread-specific associations.


>Some ideas about what could be done there: (hmm, this could be made class
>method as well)
>
>  - registering participants' factories, so that Transactions can be
>    initialized with a default set of participants, since applications often
>    use the same configuration for their Transactions. Something like:
>
>      def buildDefaultTransaction(self)

YAGNI.  The code that sets up the participants should know their
transactional scope, and thus is capable of registering them with the
appropriate transaction.


>  It seems to me that the points stressed in the sig-charter are taken into
>account here --except for the 'Effective Memory Usage' which, by the way,
>cannot be addressed at the transaction level --and I do not really see how
>this particular point can be made anything else but a ``compulsory
>recommendation'' ?!

Actually, as was noted in the savepoint-related docstrings, one purpose of
the savepoint API is to indicate a "good time to write things out", which
can free up memory used by queued updates.  Also, in ZODB's persistence
model, dirty objects can't be dropped from the cache (since they contain
state that needs to be written).  So if their writes can be flushed, they
become eligible to be "ghosted" out of the cache and the memory made
available as well.  This can be an issue in large ZODB transactions,
especially those done by full-text indexing operations.

So actually the transaction API *does* have some contact points with memory
usage.  And the main reason I put savepoint() in was to accomodate this
requirement for ZODB.  I don't really expect to have much use for it in my
primary applications development.



From pje@telecommunity.com  Mon Jul 15 22:46:54 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 15 Jul 2002 17:46:54 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
References: <Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
Message-ID: <3.0.5.32.20020715174654.0085eac0@telecommunity.com>

At 04:18 PM 7/15/02 +0200, Sebastien Bigaret wrote:
>
>- to class Participant:
>
>    canBeTPMonitored(self): # or supportNestedTransactions() ?
>      """
>      Tells whether the participant can be integrated into a TP monitoring
>      process. Valid answers are:
>
>        - YES ; e.g. RDBMS that support nested transactions will definitely
>          answer yes.
>  
>        - NO
>  
>        - NOT_APPLICABLE ; e.g. for listeners

I'm pretty sure that this terminology is not accurate.  Nested transactions
and multi-phase commit aren't really related, AFAIK.  It's quite possible
to support either one without the other.

If we're going to have introspection for multi-phase commit, I'd rather
have something like 'canVote()', with the response being False if the
participant can raise errors during commit_txn(), or true if the
participant guarantees it will not fail on commit_txn() if it didn't veto
commit during the voting phase.


>- to class Transaction:
>
>    def isTPMonitoringEnabled(self):
>      """
>      Answer is true if all participants answer 'YES' or 'NOT_APPLICABLE' to
>      'canBeTPMonitored', false otherwise.
>
>      """

This is a YAGNI, I would say; the transaction is the only party that needs
to know about its participants' voting capabilities.  But it might be
useful to expose a 'canRevert()' introspection on the transaction, that
would tell us if all the participants support reverting to a savepoint.
That information would be useful outside the transaction.



From kennethroberts@eqcity.ktb.net  Fri Jul 19 07:51:01 2002
From: kennethroberts@eqcity.ktb.net (Kennethroberts)
Date: Fri, 19 Jul 2002 06:51:01 GMT
Subject: [Persistence-sig] Put me on the list
Message-ID: <02071900013235742@eqcity.ktb.net>

Please place me on your sigs Mail list.


From pje@telecommunity.com  Fri Jul 19 16:52:07 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Fri, 19 Jul 2002 11:52:07 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3.0.5.32.20020715174654.0085eac0@telecommunity.com>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
Message-ID: <3.0.5.32.20020719115207.0086e100@telecommunity.com>

Following on the unparalleled success of the "Straw Man" transaction API
(he said, with tongue in cheek), I thought it might be good to make a
proposal for persistence as well.  Since I won't be at the BOF, I figure I
should get my two cents in now, while the getting's good.

Here's my proposal, such as it is... Deliver a Persistence package based on
the one at http://cvs.zope.org/Zope3/lib/python/Persistence/ but with the
following changes:

* Remove the BTrees subpackage, and the Class, Cache, Function, and Module
modules, along with the ICache interface.  Rationale: The BTrees package is
only useful for a relatively small subset of possible persistence backends,
and is subject to periodic data structure changes which affect applications
using it.  It's probably best kept out of the Python core.  Similar
arguments apply to the Cache system, although not quite as strongly.
Class, Function, and Module are very recent developments which have not had
the extended usage that most of the rest of the code has.  (Note: I don't
mean to say that the persistence C code has been thoroughly exercised
either, in the sense that much of it is completely new for Python 2.2.  But
its *design* has a long history, and previous implementations have had much
testing of the kind of edge and corner issues that the Class, Function, and
Module modules haven't been exposed to yet.)

* I do think we should keep PersistentList and PersistentMapping in the
core; they're useful for almost any kind of application, and any kind of
back-end storage.  They don't introduce policy or data format dependencies
into users' code, either.

* Make _p_dm a synonym for _p_jar, and deprecate _p_jar.  This could be
done by making a _p_jar descriptor that read/wrote through to _p_dm, and
issued a deprecation warning.  I don't personally have a problem with
_p_jar, but I've heard rumblings from other people (ZC folks?) that it's
confusing or that they want to get rid of it.  So if we're doing it, now
seems like the time.

* Flag _p_changed *after* __setattr__, not before!  This will help
co-operative transaction participants play nicely together, since they
can't "write through" a change if they're getting notified *before* the
change takes place!  Docs should also clarify that when set in other code,
_p_changed should be set at the latest possible moment, *after* the object
is in its new, stable state.

* Keep the _p_atime slot, but don't fill it with anything by default.
Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
at C level that's called after the getattribute completes.  A data manager
can then set the hook to point to a _p_atime update function, *or* it can
introduce postprocessing for "proxy" attributes.  That is, a data manager
could set the hook to handle "lazy" loading of certain attributes which
would otherwise be costly to retrieve, by placing a dummy value in the
object's dictionary, and then having the post-call hook return a
replacement value.

For speed, this will generally want to be a C function; let the base
package include a simple hook that updates _p_atime, and another which
checks whether the retrievedValue is an instance of a LazyValue base class,
and if so, calls the object.  This will probably cover the basics.  A data
manager that uses ZODB caching will use the atime function, and non-ZODB
data managers will probably want the other hook.  I also have an idea about
using the transaction's timestamp() plus a counter to supply a "time" value
that minimizes system calls, but I'm not sure it would actually improve
performance any, so I'm fine with not trying to push that into the initial
package.  As long as the hook slot is present in the base package, I or
anyone else are free to make up and try our own hooks to put in it.

* Get rid of the term "register", since objects won't "register" with the
transaction, and neither should they with their data manager.  They should
"inform their data manager" that they have changed.  Something like an
objectChanged() message is appropriate in place of register().  I believe
this would clarify the API.

* Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
way such that it works whether you have Interface or not", but the reality
is that a dependency in the standard library on something outside the
standard library is a big no-no, and just begging for breakage as soon as
there *is* an Interface package (with a new API) in the standard library.


Whew!  I think that about covers it, as far as what I'd like to see, and
what I think would be needed to make it acceptable for the core.  Comments?

By the way, my rationale for not taking any radical new approaches to
persistence, observation, or notification in this proposal is that the
existing Persistence package is "transparent" enough, and has the benefit
of lots of field experience.  I spent a lot of time trying to come up with
"better" ways before writing this; mostly I found that trying to make it
more "transparent" to the object being persisted, just pushes the
complexity into either the app or the backend, without really helping
anything.  It's not a really big deal to:

1. Subclass Persistent

2. Use PersistentList and PersistentMapping or other Persistent objects for
your attributes, or set self._p_changed when you change a non-persistent
mutable.

3. Use transactions

Especially if that's all you need to do in order to have persistence to any
number of backends, including the current ZODB and all the wonderful SQL or
other mappings that will be creatable by everybody on this list using their
own techniques.  The key is not so much "transparency" per se, as
*uniformity* across backends.  I think the existing API is transparent
enough; let's work on having uniform and universal access to it, as a
Python core package.



From pje@telecommunity.com  Fri Jul 19 17:02:37 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Fri, 19 Jul 2002 12:02:37 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <3.0.5.32.20020715115756.01ab65f0@telecommunity.com>
References: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <"Phillip J. Eby"'s message of "Sun, 14 Jul 2002 12:21:52 -0400">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <3.0.5.32.20020719120237.00898b60@telecommunity.com>

One further comment on the Straw Man transaction API...  I believe that the
Python transaction API should issue Python warnings for problematic
conditions, rather than write to a logger (such as zLOG in the current ZODB
transactions).

IMHO, even though 2.3 will include a logging mdoule, I'm not comfortable
with the idea of a transaction co-ordinator itself issuing log messages,
especially given the complexity of the logging package that's the main
contender for implementing the logging PEP.  I'd rather have something
extremely simple, and warnings seem to me like the way to, well, issue
warnings.  :)

If there's conflict about this point, though, I'd be okay with isolating
either log calls or warnings into methods of the base transaction that
could be overridden in a subclass, and then folks can choose their own way
from there.



From guido@python.org  Fri Jul 19 17:09:01 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 19 Jul 2002 12:09:01 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: Your message of "Fri, 19 Jul 2002 12:02:37 EDT."
             <3.0.5.32.20020719120237.00898b60@telecommunity.com> 
References: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>  
	<3.0.5.32.20020719120237.00898b60@telecommunity.com> 
Message-ID: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>

> One further comment on the Straw Man transaction API...  I believe that the
> Python transaction API should issue Python warnings for problematic
> conditions, rather than write to a logger (such as zLOG in the current ZODB
> transactions).
> 
> IMHO, even though 2.3 will include a logging mdoule, I'm not comfortable
> with the idea of a transaction co-ordinator itself issuing log messages,
> especially given the complexity of the logging package that's the main
> contender for implementing the logging PEP.  I'd rather have something
> extremely simple, and warnings seem to me like the way to, well, issue
> warnings.  :)
> 
> If there's conflict about this point, though, I'd be okay with isolating
> either log calls or warnings into methods of the base transaction that
> could be overridden in a subclass, and then folks can choose their own way
> from there.

Warnings seem better to me because there are several ways to decide
how to deal with them (including turning them into errors and
suppressing them completely) under control of either the program or
command line options.

It's also possible to have warnings be sent to a logger, and
applications that use the logger should probably set this up.  (Hm,
maybe it would be cool if the logging module has a shortcut to
redirect all warnings to the log?)

--Guido van Rossum (home page: http://www.python.org/~guido/)


From guido@python.org  Fri Jul 19 21:03:54 2002
From: guido@python.org (Guido van Rossum)
Date: Fri, 19 Jul 2002 16:03:54 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: Your message of "Fri, 19 Jul 2002 11:52:07 EDT."
             <3.0.5.32.20020719115207.0086e100@telecommunity.com> 
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>  
	<3.0.5.32.20020719115207.0086e100@telecommunity.com> 
Message-ID: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>

> * I do think we should keep PersistentList and PersistentMapping in the
> core; they're useful for almost any kind of application, and any kind of
> back-end storage.  They don't introduce policy or data format dependencies
> into users' code, either.

But perhaps these should be rewritten to derive from dict and list
instead of UserDict and UserList?  Also, the module names are
inconsistent -- PersistentMapping is defined in _persistentMapping.py
but PersistentList is defined in PersistentList.py.  Both are then
"pulled up" one level by __init__.py and their __module__ attribute
modified.  I find all that hideous and tricky, and I propose to clean
this up before making it a standard Python package.

> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar.  This could be
> done by making a _p_jar descriptor that read/wrote through to _p_dm, and
> issued a deprecation warning.  I don't personally have a problem with
> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's
> confusing or that they want to get rid of it.  So if we're doing it, now
> seems like the time.

It's just that "jar" makes no sense (except in the "cutesy" sense of a
jar full of pickles).  But "dm" is a little obscure too.  Maybe write
it out in full as _p_datamanager?

> * Flag _p_changed *after* __setattr__, not before!  This will help
> co-operative transaction participants play nicely together, since they
> can't "write through" a change if they're getting notified *before* the
> change takes place!  Docs should also clarify that when set in other code,
> _p_changed should be set at the latest possible moment, *after* the object
> is in its new, stable state.

+1

> * Keep the _p_atime slot, but don't fill it with anything by default.
> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
> at C level that's called after the getattribute completes.  A data manager
> can then set the hook to point to a _p_atime update function, *or* it can
> introduce postprocessing for "proxy" attributes.  That is, a data manager
> could set the hook to handle "lazy" loading of certain attributes which
> would otherwise be costly to retrieve, by placing a dummy value in the
> object's dictionary, and then having the post-call hook return a
> replacement value.
> 
> For speed, this will generally want to be a C function; let the base
> package include a simple hook that updates _p_atime, and another which
> checks whether the retrievedValue is an instance of a LazyValue base class,
> and if so, calls the object.  This will probably cover the basics.  A data
> manager that uses ZODB caching will use the atime function, and non-ZODB
> data managers will probably want the other hook.  I also have an idea about
> using the transaction's timestamp() plus a counter to supply a "time" value
> that minimizes system calls, but I'm not sure it would actually improve
> performance any, so I'm fine with not trying to push that into the initial
> package.  As long as the hook slot is present in the base package, I or
> anyone else are free to make up and try our own hooks to put in it.

Shouldn't there be a setattr hook too?

> * Get rid of the term "register", since objects won't "register" with the
> transaction, and neither should they with their data manager.  They should
> "inform their data manager" that they have changed.  Something like an
> objectChanged() message is appropriate in place of register().  I believe
> this would clarify the API.
> 
> * Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
> way such that it works whether you have Interface or not", but the reality
> is that a dependency in the standard library on something outside the
> standard library is a big no-no, and just begging for breakage as soon as
> there *is* an Interface package (with a new API) in the standard library.

Of course.

> Whew!  I think that about covers it, as far as what I'd like to see, and
> what I think would be needed to make it acceptable for the core.  Comments?
> 
> By the way, my rationale for not taking any radical new approaches to
> persistence, observation, or notification in this proposal is that the
> existing Persistence package is "transparent" enough, and has the benefit
> of lots of field experience.  I spent a lot of time trying to come up with
> "better" ways before writing this; mostly I found that trying to make it
> more "transparent" to the object being persisted, just pushes the
> complexity into either the app or the backend, without really helping
> anything.  It's not a really big deal to:
> 
> 1. Subclass Persistent
> 
> 2. Use PersistentList and PersistentMapping or other Persistent objects for
> your attributes, or set self._p_changed when you change a non-persistent
> mutable.
> 
> 3. Use transactions
> 
> Especially if that's all you need to do in order to have persistence to any
> number of backends, including the current ZODB and all the wonderful SQL or
> other mappings that will be creatable by everybody on this list using their
> own techniques.  The key is not so much "transparency" per se, as
> *uniformity* across backends.  I think the existing API is transparent
> enough; let's work on having uniform and universal access to it, as a
> Python core package.

I've often thought that it's ugly that you have to set _p_state and
_p_changed, rather than do these things with method calls.  What do
you think about that?  Especially the conventions for _p_state look
confusing to me.

--Guido van Rossum (home page: http://www.python.org/~guido/)


From pje@telecommunity.com  Fri Jul 19 23:12:35 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Fri, 19 Jul 2002 18:12:35 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comca
 st.net>
References: <Your message of "Fri, 19 Jul 2002 11:52:07 EDT."
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
Message-ID: <3.0.5.32.20020719181235.00894ec0@telecommunity.com>

At 04:03 PM 7/19/02 -0400, Guido van Rossum wrote:
>> * I do think we should keep PersistentList and PersistentMapping in the
>> core; they're useful for almost any kind of application, and any kind of
>> back-end storage.  They don't introduce policy or data format dependencies
>> into users' code, either.
>
>But perhaps these should be rewritten to derive from dict and list
>instead of UserDict and UserList?  

Perhaps.  What are the implications for pickling?


>Also, the module names are
>inconsistent -- PersistentMapping is defined in _persistentMapping.py
>but PersistentList is defined in PersistentList.py.  Both are then
>"pulled up" one level by __init__.py and their __module__ attribute
>modified.  I find all that hideous and tricky, and I propose to clean
>this up before making it a standard Python package.

+1


>> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar.  This could be
>> done by making a _p_jar descriptor that read/wrote through to _p_dm, and
>> issued a deprecation warning.  I don't personally have a problem with
>> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's
>> confusing or that they want to get rid of it.  So if we're doing it, now
>> seems like the time.
>
>It's just that "jar" makes no sense (except in the "cutesy" sense of a
>jar full of pickles).  But "dm" is a little obscure too.  Maybe write
>it out in full as _p_datamanager?

Sure, whatever.  Maybe just _p_manager.


>> * Keep the _p_atime slot, but don't fill it with anything by default.
>> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
>> at C level that's called after the getattribute completes.  A data manager
>> can then set the hook to point to a _p_atime update function, *or* it can
>> introduce postprocessing for "proxy" attributes.  That is, a data manager
>> could set the hook to handle "lazy" loading of certain attributes which
>> would otherwise be costly to retrieve, by placing a dummy value in the
>> object's dictionary, and then having the post-call hook return a
>> replacement value.
>> 
>> For speed, this will generally want to be a C function; let the base
>> package include a simple hook that updates _p_atime, and another which
>> checks whether the retrievedValue is an instance of a LazyValue base class,
>> and if so, calls the object.  This will probably cover the basics.  A data
>> manager that uses ZODB caching will use the atime function, and non-ZODB
>> data managers will probably want the other hook.  I also have an idea about
>> using the transaction's timestamp() plus a counter to supply a "time" value
>> that minimizes system calls, but I'm not sure it would actually improve
>> performance any, so I'm fine with not trying to push that into the initial
>> package.  As long as the hook slot is present in the base package, I or
>> anyone else are free to make up and try our own hooks to put in it.
>
>Shouldn't there be a setattr hook too?

Hm.  Seems like a YAGNI to me, unless you're saying that it's so that
_p_atime can be updated on setattr, in which case, sure, add a
_p_setattr_hook(obj,attrname,setval) that's called after successful
setattr.  Otherwise, I can't think of a use case that isn't already covered
by the objectChanged() (formerly register()) message.


>I've often thought that it's ugly that you have to set _p_state and
>_p_changed, rather than do these things with method calls.  What do
>you think about that?  Especially the conventions for _p_state look
>confusing to me.

I've never used _p_state for anything; I thought that was something purely
private/internal to the implementation.  So I'm not sure what you're
talking about, there.

For _p_changed, I don't have any objections to a method or methods, but it
seems to me that it *was* a method at one time and Jim changed it to an
attribute, so it might be good to ask him why.  :)

Of course, I've also seen people using ZODB write code like this:

self.foo = self.foo

To flag things as changed, without using an explicit _p_changed call.  On a
mental level, it has a certain appeal, because it's like saying, hey, I'm
changing *this* attribute.  :)  

But I don't have a strong preference for or against any of these three
broad categories of change signalling.


From smenard@bigfoot.com  Fri Jul 19 23:28:29 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Fri, 19 Jul 2002 18:28:29 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3.0.5.32.20020719181235.00894ec0@telecommunity.com>
References: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comca
	st.net><Your message of "Fri, 19 Jul 2002 11:52:07 EDT."
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
Message-ID: <5.1.0.14.0.20020719182600.02a1fc98@pop.videotron.ca>

At 06:12 PM 7/19/2002 -0400, Phillip J. Eby wrote:
>At 04:03 PM 7/19/02 -0400, Guido van Rossum wrote:
> >> * I do think we should keep PersistentList and PersistentMapping in the
> >> core; they're useful for almost any kind of application, and any kind of
> >> back-end storage.  They don't introduce policy or data format dependencies
> >> into users' code, either.
> >
> >But perhaps these should be rewritten to derive from dict and list
> >instead of UserDict and UserList?
>
>Perhaps.  What are the implications for pickling?

I have done exactly that for POD and it works great.



> >Also, the module names are
> >inconsistent -- PersistentMapping is defined in _persistentMapping.py
> >but PersistentList is defined in PersistentList.py.  Both are then
> >"pulled up" one level by __init__.py and their __module__ attribute
> >modified.  I find all that hideous and tricky, and I propose to clean
> >this up before making it a standard Python package.
>
>+1

I agreed too. For consistency, could we make PersistentMapping a synonym 
for PersistentDict?



From jeremy@alum.mit.edu  Mon Jul 22 15:05:48 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 22 Jul 2002 10:05:48 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15676.4412.741538.532146@slothrop.zope.com>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

  >> * I do think we should keep PersistentList and PersistentMapping
  >>   in the
  >> core; they're useful for almost any kind of application, and any
  >> kind of back-end storage.  They don't introduce policy or data
  >> format dependencies into users' code, either.

  GvR> But perhaps these should be rewritten to derive from dict and
  GvR> list instead of UserDict and UserList? 

One small comment.  (I owe more substantial comment on Phillip's
earlier proposals.)  The persistent versions of dict and list can't
extend the builtin types, because they need to hook __getitem__() and
__setitem__().  The overridden methods may not be called if we extend
the builtin types.

Jeremy




From smenard@bigfoot.com  Mon Jul 22 15:26:18 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Mon, 22 Jul 2002 10:26:18 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.4412.741538.532146@slothrop.zope.com>
References: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
 <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <3.0.5.32.20020719115207.0086e100@telecommunity.com>
 <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>

At 10:05 AM 7/22/2002 -0400, Jeremy Hylton wrote:
> >>>>> "GvR" == Guido van Rossum <guido@python.org> writes:
>
>   >> * I do think we should keep PersistentList and PersistentMapping
>   >>   in the
>   >> core; they're useful for almost any kind of application, and any
>   >> kind of back-end storage.  They don't introduce policy or data
>   >> format dependencies into users' code, either.
>
>   GvR> But perhaps these should be rewritten to derive from dict and
>   GvR> list instead of UserDict and UserList?
>
>One small comment.  (I owe more substantial comment on Phillip's
>earlier proposals.)  The persistent versions of dict and list can't
>extend the builtin types, because they need to hook __getitem__() and
>__setitem__().  The overridden methods may not be called if we extend
>the builtin types.
>
>Jeremy

hum, if those method are not guaranteed to be called by subclassing dict or 
list, then there is something broken. Either that or there is a subtle 
thing I do not understand.

On a side note, as I have said in another post, I have done exactly that, 
subclassing dict and list. While my model didn't need to override 
__getitem__(), the __setitem__() at least seemed to act properly. In fact 
the only problem I have found is that it was not possible to mix __slots__ 
and dict/list.

         Steve



From jacobs@penguin.theopalgroup.com  Mon Jul 22 15:59:36 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Mon, 22 Jul 2002 10:59:36 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
Message-ID: <Pine.LNX.4.44.0207221055160.14923-100000@penguin.theopalgroup.com>

On Mon, 22 Jul 2002, Steve Menard wrote:
> At 10:05 AM 7/22/2002 -0400, Jeremy Hylton wrote:
> >One small comment.  (I owe more substantial comment on Phillip's
> >earlier proposals.)  The persistent versions of dict and list can't
> >extend the builtin types, because they need to hook __getitem__() and
> >__setitem__().  The overridden methods may not be called if we extend
> >the builtin types.
> 
> hum, if those method are not guaranteed to be called by subclassing dict or 
> list, then there is something broken. Either that or there is a subtle 
> thing I do not understand.

In fact, I am quite sure that one can inherit from list or dict and override
__getitem__ and __setitem__ in a cooperative fashion.  Can you provide a
little more information on why you think otherwise?

> On a side note, as I have said in another post, I have done exactly that, 
> subclassing dict and list. While my model didn't need to override 
> __getitem__(), the __setitem__() at least seemed to act properly. In fact 
> the only problem I have found is that it was not possible to mix __slots__ 
> and dict/list.

For all strange and perverse things I've done, slots work just fine when
inheriting from list and dict.  Again, can you provide an example of where
you found otherwise?

Thanks,
-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From jeremy@alum.mit.edu  Mon Jul 22 16:02:11 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 22 Jul 2002 11:02:11 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
References: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
	<87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
Message-ID: <15676.7795.542136.74597@slothrop.zope.com>

>>>>> "SM" == Steve Menard <smenard@bigfoot.com> writes:

  GvR> But perhaps these should be rewritten to derive from dict and
  GvR> list instead of UserDict and UserList?
  >>
  >> One small comment.  (I owe more substantial comment on Phillip's
  >> earlier proposals.)  The persistent versions of dict and list
  >> can't extend the builtin types, because they need to hook
  >> __getitem__() and __setitem__().  The overridden methods may not
  >> be called if we extend the builtin types.
  >>

  SM> hum, if those method are not guaranteed to be called by
  SM> subclassing dict or list, then there is something broken. Either
  SM> that or there is a subtle thing I do not understand.

The latter.  For performance reasons, most C code uses calls like
PyDict_GetItem(), which operates directly on the C representation of a
dict.  If you inherit from dict, you'll get the same C representation
for your object.  That allows PyDict_GetItem() to be called, but
doesn't arrange to call your __getitem__() method.  The indirection
required to invoke a subclass's __getitem__() would cause serious
performance problems.

Normally Guido only recommends inheriting from dict to add new
behavior (as opposed to customizing existing behavior).

Jeremy





From smenard@bigfoot.com  Mon Jul 22 16:41:36 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Mon, 22 Jul 2002 11:41:36 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <Pine.LNX.4.44.0207221055160.14923-100000@penguin.theopalgr
 oup.com>
References: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
Message-ID: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca>

At 10:59 AM 7/22/2002 -0400, Kevin Jacobs wrote:
>On Mon, 22 Jul 2002, Steve Menard wrote:
> > At 10:05 AM 7/22/2002 -0400, Jeremy Hylton wrote:
> > >One small comment.  (I owe more substantial comment on Phillip's
> > >earlier proposals.)  The persistent versions of dict and list can't
> > >extend the builtin types, because they need to hook __getitem__() and
> > >__setitem__().  The overridden methods may not be called if we extend
> > >the builtin types.
> >
> > hum, if those method are not guaranteed to be called by subclassing 
> dict or
> > list, then there is something broken. Either that or there is a subtle
> > thing I do not understand.
>
>In fact, I am quite sure that one can inherit from list or dict and override
>__getitem__ and __setitem__ in a cooperative fashion.  Can you provide a
>little more information on why you think otherwise?
Mater of fact, I do not think otherwise. Jeremy said :

"The overridden methods may not be called if we extend the builtin types."

Which I think is wrong.


> > On a side note, as I have said in another post, I have done exactly that,
> > subclassing dict and list. While my model didn't need to override
> > __getitem__(), the __setitem__() at least seemed to act properly. In fact
> > the only problem I have found is that it was not possible to mix __slots__
> > and dict/list.
>
>For all strange and perverse things I've done, slots work just fine when
>inheriting from list and dict.  Again, can you provide an example of where
>you found otherwise?

Ok, my problem was from inheriting both from dict and from my Persistent 
class. Persistent was using slots. I could dig out or reproduce error 
message if you're interested.


         Steve



From smenard@bigfoot.com  Mon Jul 22 16:44:10 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Mon, 22 Jul 2002 11:44:10 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.7795.542136.74597@slothrop.zope.com>
References: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
 <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
 <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <3.0.5.32.20020719115207.0086e100@telecommunity.com>
 <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
Message-ID: <5.1.0.14.0.20020722114214.07a9d2d0@pop.videotron.ca>

At 11:02 AM 7/22/2002 -0400, Jeremy Hylton wrote:
> >>>>> "SM" == Steve Menard <smenard@bigfoot.com> writes:
>
>   GvR> But perhaps these should be rewritten to derive from dict and
>   GvR> list instead of UserDict and UserList?
>   >>
>   >> One small comment.  (I owe more substantial comment on Phillip's
>   >> earlier proposals.)  The persistent versions of dict and list
>   >> can't extend the builtin types, because they need to hook
>   >> __getitem__() and __setitem__().  The overridden methods may not
>   >> be called if we extend the builtin types.
>   >>
>
>   SM> hum, if those method are not guaranteed to be called by
>   SM> subclassing dict or list, then there is something broken. Either
>   SM> that or there is a subtle thing I do not understand.
>
>The latter.  For performance reasons, most C code uses calls like
>PyDict_GetItem(), which operates directly on the C representation of a
>dict.  If you inherit from dict, you'll get the same C representation
>for your object.  That allows PyDict_GetItem() to be called, but
>doesn't arrange to call your __getitem__() method.  The indirection
>required to invoke a subclass's __getitem__() would cause serious
>performance problems.

Ok, makes sense. Since it is unsafe to override those methods, perhaps it 
should be disallowed then. Because we get different behavior when obj[x] is 
called from C and when called from Python.

>Normally Guido only recommends inheriting from dict to add new
>behavior (as opposed to customizing existing behavior).

         Steve




From jacobs@penguin.theopalgroup.com  Mon Jul 22 16:42:11 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Mon, 22 Jul 2002 11:42:11 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca>
Message-ID: <Pine.LNX.4.44.0207221136490.15652-100000@penguin.theopalgroup.com>

On Mon, 22 Jul 2002, Steve Menard wrote:
> Mater of fact, I do not think otherwise. Jeremy said :
> 
> "The overridden methods may not be called if we extend the builtin types."
> 
> Which I think is wrong.

I see what he is saying now -- most of the Python core does a
PyDict_Check(o), not a PyDict_CheckExact(o) to determine if an object is a
real (base) dictionary.  Such code then does PyDict_GetItem/PyDict_SetItem
rather than PyObject_GetItem/PyObject_SetItem, and thus bypass your derived
__getitem__ and __setitem__.

> > > On a side note, as I have said in another post, I have done exactly that,
> > > subclassing dict and list. While my model didn't need to override
> > > __getitem__(), the __setitem__() at least seemed to act properly. In fact
> > > the only problem I have found is that it was not possible to mix __slots__
> > > and dict/list.
> >
> >For all strange and perverse things I've done, slots work just fine when
> >inheriting from list and dict.  Again, can you provide an example of where
> >you found otherwise?
> 
> Ok, my problem was from inheriting both from dict and from my Persistent 
> class. Persistent was using slots. I could dig out or reproduce error 
> message if you're interested.

>From your description, I see what is happening now.  I have a meta-class
that lazily instantiates slots, which may help.  It totally avoids the
problem of layout conflicts, so long as all base classes can have slots
added to them (i.e., not anything that inherits from tuple).

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From jacobs@penguin.theopalgroup.com  Mon Jul 22 16:45:14 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Mon, 22 Jul 2002 11:45:14 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020722114214.07a9d2d0@pop.videotron.ca>
Message-ID: <Pine.LNX.4.44.0207221142260.15652-100000@penguin.theopalgroup.com>

On Mon, 22 Jul 2002, Steve Menard wrote:
> Ok, makes sense. Since it is unsafe to override those methods, perhaps it 
> should be disallowed then. Because we get different behavior when obj[x] is 
> called from C and when called from Python.

I would be happier of we had a PyDict_{G,S}etItemExact for when we know we
have a base dict, and modify PyDict_{G,S}etItem to use PyObject_{G,S}etItem
when not PyDict_CheckExact.

It will be a pain, but being correct is almost always better than being
fast.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From jeremy@alum.mit.edu  Mon Jul 22 16:50:41 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 22 Jul 2002 11:50:41 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca>
References: <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
	<5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca>
Message-ID: <15676.10705.19551.392096@slothrop.zope.com>

>>>>> "SM" == Steve Menard <smenard@bigfoot.com> writes:

  >> > On a side note, as I have said in another post, I have done
  >> > exactly that, subclassing dict and list. While my model didn't
  >> > need to override __getitem__(), the __setitem__() at least
  >> > seemed to act properly. In fact the only problem I have found
  >> > is that it was not possible to mix __slots__ and dict/list.
  >>
  >> For all strange and perverse things I've done, slots work just
  >> fine when inheriting from list and dict.  Again, can you provide
  >> an example of where you found otherwise?

  SM> Ok, my problem was from inheriting both from dict and from my
  SM> Persistent class. Persistent was using slots. I could dig out or
  SM> reproduce error message if you're interested.

dict and Persistent are not compatible at the C level.  That's a
second problem, and one that I hadn't thought of.  (It doesn't have
anything to do with slots.)

>>> class PD(Persistent, dict):
...     pass
...
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: multiple bases have instance lay-out conflict

There's no way to make this problem go away if we continue to
implement persistence in C.

Jeremy



From smenard@bigfoot.com  Mon Jul 22 17:08:37 2002
From: smenard@bigfoot.com (Steve Menard)
Date: Mon, 22 Jul 2002 12:08:37 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.10705.19551.392096@slothrop.zope.com>
References: <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca>
 <5.1.0.14.0.20020722102253.02abb498@pop.videotron.ca>
 <5.1.0.14.0.20020722113746.07a8bd20@pop.videotron.ca>
Message-ID: <5.1.0.14.0.20020722120712.02aec010@pop.videotron.ca>

At 11:50 AM 7/22/2002 -0400, Jeremy Hylton wrote:
> >>>>> "SM" == Steve Menard <smenard@bigfoot.com> writes:
>
>   >> > On a side note, as I have said in another post, I have done
>   >> > exactly that, subclassing dict and list. While my model didn't
>   >> > need to override __getitem__(), the __setitem__() at least
>   >> > seemed to act properly. In fact the only problem I have found
>   >> > is that it was not possible to mix __slots__ and dict/list.
>   >>
>   >> For all strange and perverse things I've done, slots work just
>   >> fine when inheriting from list and dict.  Again, can you provide
>   >> an example of where you found otherwise?
>
>   SM> Ok, my problem was from inheriting both from dict and from my
>   SM> Persistent class. Persistent was using slots. I could dig out or
>   SM> reproduce error message if you're interested.
>
>dict and Persistent are not compatible at the C level.  That's a
>second problem, and one that I hadn't thought of.  (It doesn't have
>anything to do with slots.)
>
> >>> class PD(Persistent, dict):
>...     pass
>...
>Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
>TypeError: multiple bases have instance lay-out conflict
>
>There's no way to make this problem go away if we continue to
>implement persistence in C.

Right. That's the same problem I had, even though my Persistent was not 
implemented in C. It simply used __slots__. I guess since __slots__ change 
the layout of the object the same problem is caused.

         Steve





From jim@zope.com  Mon Jul 22 18:16:46 2002
From: jim@zope.com (Jim Fulton)
Date: Mon, 22 Jul 2002 13:16:46 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <Sebastien Bigaret's
	message of "15 Jul 2002 15:06:16 +0200">
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
Message-ID: <3D3C3DFE.6070203@zope.com>

Phillip J. Eby wrote:
> Following on the unparalleled success of the "Straw Man" transaction API
> (he said, with tongue in cheek),

It seemed pretty sucessful to me.

 > I thought it might be good to make a
> proposal for persistence as well. 

Thanks. This is very helpful.

 > Since I won't be at the BOF,

We'll miss you.

 > I figure I
> should get my two cents in now, while the getting's good.
> 
> Here's my proposal, such as it is... Deliver a Persistence package based on
> the one at http://cvs.zope.org/Zope3/lib/python/Persistence/ but with the
> following changes:
> 
> * Remove the BTrees subpackage, and the Class, Cache, Function, and Module
> modules, along with the ICache interface.  Rationale: The BTrees package is
> only useful for a relatively small subset of possible persistence backends,
> and is subject to periodic data structure changes which affect applications
> using it. 

I'm OK with taking out BTrees, however, BTrees were included in ZODB by
very popular demand.

You haven't given a rational for not including the caching framework.
The caching framework is closely ties to persistence and, I think,
largely independent of data managers.

 > It's probably best kept out of the Python core.  Similar
> arguments apply to the Cache system, although not quite as strongly.
> Class, Function, and Module are very recent developments which have not had
> the extended usage that most of the rest of the code has. 

Fair enough.

 > (Note: I don't
> mean to say that the persistence C code has been thoroughly exercised
> either, in the sense that much of it is completely new for Python 2.2.  But
> its *design* has a long history, and previous implementations have had much
> testing of the kind of edge and corner issues that the Class, Function, and
> Module modules haven't been exposed to yet.)
> 
> * I do think we should keep PersistentList and PersistentMapping in the
> core; they're useful for almost any kind of application, and any kind of
> back-end storage.  They don't introduce policy or data format dependencies
> into users' code, either.

I *never* use persistent list and almost never use persistent mapping.
I find BTrees far more useful. :)


> * Make _p_dm a synonym for _p_jar, and deprecate _p_jar.  This could be
> done by making a _p_jar descriptor that read/wrote through to _p_dm, and
> issued a deprecation warning.  I don't personally have a problem with
> _p_jar, but I've heard rumblings from other people (ZC folks?) that it's
> confusing or that they want to get rid of it.  So if we're doing it, now
> seems like the time.

I wouldn't worry about backward compatability. Ditch '_p_jar' and pick
a better name, like '_p_manager' as you suggested.

> * Flag _p_changed *after* __setattr__, not before!  This will help
> co-operative transaction participants play nicely together, since they
> can't "write through" a change if they're getting notified *before* the
> change takes place! 

It would be helpful if you could provide an illustrative example in a separate
dedicated message.

 > Docs should also clarify that when set in other code,
> _p_changed should be set at the latest possible moment, *after* the object
> is in its new, stable state.

I'm with Guido in wanting a set of api calls to replace the baroque
'_p_changed' semantics.

Note to both you and Guido, you (Phillip) are right, _p_state is an internal
implementation detail.

> * Keep the _p_atime slot, but don't fill it with anything by default.
> Instead, have a _p_getattr_hook(persistentObj,attrName,retrievedValue) slot
> at C level that's called after the getattribute completes.  A data manager
> can then set the hook to point to a _p_atime update function, *or* it can
> introduce postprocessing for "proxy" attributes.  That is, a data manager
> could set the hook to handle "lazy" loading of certain attributes which
> would otherwise be costly to retrieve, by placing a dummy value in the
> object's dictionary, and then having the post-call hook return a
> replacement value.

I suggest we step back a bit and think of the API in terms of events.
I suggest we think about what events are generated and who they are
sent to. Your API change is consistent with that,


> For speed, this will generally want to be a C function; let the base
> package include a simple hook that updates _p_atime, and another which
> checks whether the retrievedValue is an instance of a LazyValue base class,
> and if so, calls the object.  This will probably cover the basics.  A data
> manager that uses ZODB caching will use the atime function, and non-ZODB
> data managers will probably want the other hook.  I also have an idea about
> using the transaction's timestamp() plus a counter to supply a "time" value
> that minimizes system calls, but I'm not sure it would actually improve
> performance any, so I'm fine with not trying to push that into the initial
> package.  As long as the hook slot is present in the base package, I or
> anyone else are free to make up and try our own hooks to put in it.

I'd like to get rid of _p_atime, as it is totally dependent on a particular
cache implementation, which we happen to be phasing out.

Persistent objects should have *no*


> * Get rid of the term "register", since objects won't "register" with the
> transaction, and neither should they with their data manager.  They should
> "inform their data manager" that they have changed.  Something like an
> objectChanged() message is appropriate in place of register().  I believe
> this would clarify the API.

That's fine.


> * Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
> way such that it works whether you have Interface or not", but the reality
> is that a dependency in the standard library on something outside the
> standard library is a big no-no, and just begging for breakage as soon as
> there *is* an Interface package (with a new API) in the standard library.

I think that this is a very bad idea. I think the interfaces clarify things
quite a bit.

> Whew!  I think that about covers it, as far as what I'd like to see, and
> what I think would be needed to make it acceptable for the core.  Comments?
> 
> By the way, my rationale for not taking any radical new approaches to
> persistence, observation, or notification in this proposal is that the
> existing Persistence package is "transparent" enough, and has the benefit
> of lots of field experience.  I spent a lot of time trying to come up with
> "better" ways before writing this; mostly I found that trying to make it
> more "transparent" to the object being persisted, just pushes the
> complexity into either the app or the backend, without really helping
> anything.  It's not a really big deal to:
> 
> 1. Subclass Persistent
> 
> 2. Use PersistentList and PersistentMapping or other Persistent objects for
> your attributes, or set self._p_changed when you change a non-persistent
> mutable.

These are not a big deal to you, because you have a deep understanding and
interest in the machinery. They are a big deal to most people. It would
be *wonderful* if we could avoid this. Maybe if we had a standard persistence
framework, we could motivate language changes that made this cleaner. :)


> 3. Use transactions
> 
> Especially if that's all you need to do in order to have persistence to any
> number of backends, including the current ZODB and all the wonderful SQL or
> other mappings that will be creatable by everybody on this list using their
> own techniques.  The key is not so much "transparency" per se, as
> *uniformity* across backends.  I think the existing API is transparent
> enough; let's work on having uniform and universal access to it, as a
> Python core package.

Transactions are a huge benefit, as opposed to something that is "not
really a big deal". :)

Here are some additional points:

- While we should provide a standard implementation of a persistence
   *interface*, we should allow other implementations. For example, the
   data manager or cache should not depend on internal details of the
   persistence implementation. We should not require a specific C layout
   for persistent objects, for example.

- The persistence interface and implementations should be independent of
   the cache implementations (e.g. no _p_atime). We *do* need to provide
   an better API for handling objects that are unwilling to be deactivated.
   Perhaps _p_deactivate should return a value indicating whether the object
   was deactivated, and, if not, perhaps why.

- We need to define the state model for persistent objects. I'd like to include
   the notion of a persistent refcount. Possible states are:

   o Unsaved

   o Up to date

   o changed

   o ghost

   In addition, there is a persistent reference count. This is used by C code
   to indicate that the object is being used outside of Python. An objecty
   can't be turned into a ghost if it's persistent reference count is > 0.
   We'll model the reference count as a "sticky" state. We transition to the sticky
   state when the reference count becomes non-zero and from the sticky state
   when the reference count drops to zero. This state is largely indepent of the other
   states.

- I'd like to spend some time thinking through persistence related events.
   Here's a start:

     o When a persistent object is modified while in the up-to-date state,
       it should notify it's datata manager and transition to the changed state.

     o When the object it accessed, it should notify it's data manager. Perhaps it
       should pass it's current state.

     o The persistent object calls a method on the data manager when it's state
       needs to be loaded.

     o The persistent object should probably notify the data manager of any state
       changes.

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From pje@telecommunity.com  Mon Jul 22 19:32:17 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 22 Jul 2002 14:32:17 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3D3C3DFE.6070203@zope.com>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <3.0.5.32.20020719115207.0086e100@telecommunity.com>
Message-ID: <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com>

At 01:16 PM 7/22/02 -0400, Jim Fulton wrote:
>Phillip J. Eby wrote:
>>* Remove the BTrees subpackage, and the Class, Cache, Function, and Module
>>modules, along with the ICache interface.  Rationale: The BTrees package is
>>only useful for a relatively small subset of possible persistence backends,
>>and is subject to periodic data structure changes which affect applications
>>using it.
>
>I'm OK with taking out BTrees, however, BTrees were included in ZODB by
>very popular demand.

And they should continue to be included with ZODB.  But IMHO their use is 
specific to persistence mechanisms which use "pickle jar"-style or 
"shelve"-like primitive databases.  (Primitive in the sense of not 
providing any concepts such as indexes or built-in search 
capabilities.)  If you have a higher-level mechanism, even one as simple as 
SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off using 
those features of the backend.

If this were not true, there'd be no need for any persistence mechanisms 
besides ZODB, and we wouldn't be having this conversation.  :)

(Note that I'm assuming that ZODB itself will continue to exist as an 
independent package, providing a persistence mechanism through its 
Connection, Database, and Storage objects.  It just shouldn't need to 
include Persistence or Transaction any more; BTrees would become 
ZODB.BTrees, or something similar.)


>You haven't given a rational for not including the caching framework.
>The caching framework is closely ties to persistence and, I think,
>largely independent of data managers.

IMHO the existing caches are tied to a specific caching policy, which 
embeds many ZODB-ish assumptions.  For RDBMS work, I primarily need 
transactional caching, where caches are cleared between transactions.  For 
that, I can use a simple WeakValueDictionary, with some code that 
deactivates objects between transactions.

But if you think we should throw in some basic cache implementations for 
the most common caching policies, I've no objection.  I just thought it 
better to save argument at the present time over *which* caching policies 
would be most common.  :)


>>* I do think we should keep PersistentList and PersistentMapping in the
>>core; they're useful for almost any kind of application, and any kind of
>>back-end storage.  They don't introduce policy or data format dependencies
>>into users' code, either.
>
>I *never* use persistent list and almost never use persistent mapping.
>I find BTrees far more useful. :)

Then I suppose we could drop them, too.  :)  But I suspect that examining 
third-party usage of ZODB (including Zope 2 products) would show them to be 
moderately popular, even for use with ZODB.

There's another reason for including them, though...  they serve as very 
simple examples of implementing persistent objects whose attributes are 
mutable objects.


>>* Flag _p_changed *after* __setattr__, not before!  This will help
>>co-operative transaction participants play nicely together, since they
>>can't "write through" a change if they're getting notified *before* the
>>change takes place!
>
>It would be helpful if you could provide an illustrative example in a separate
>dedicated message.

Okay.  I'm persisting some objects in an SQL database.  I have two txn 
participants: the SQL persistence manager, and the SQL database 
connection.  The former writes to the latter.  But actually, I have a third 
participant, another persistence manager which manages a "facade" object 
whose state is stored in two of the objects managed by the SQL persistence 
manager.  We reach transaction commit, and the "facade" object has 
uncommitted state...  The "begin_commit" message (formerly tpc_begin) 
reaches the SQL persistence manager first, so it does nothing because no 
state has been written by the third manager.  It then goes into 
"write-through" state.  The message reaches the SQL DB connection next, and 
it ignores it, because it's always in "write-through" mode, 
effectively.  Finally it reaches the third manager, which writes the dirty 
state from the facade to the underlying SQL-persisted objects.  If they 
notify their manager of the change, before they're actually changed (as 
setattr does now), the manager will try to "write-through" a change that 
hasn't occurred yet, causing a lost write.

Conversely, if change notification always takes place *after* a change, the 
write-through will succeed, and by extension, one can have as many levels 
of "write-through" transaction participants as one desires, without the 
transaction itself needing to be aware of dependencies between 
participants, and without requiring more commit phases or other 
complications, such as are needed by Steve Alexander's TransactionAgents 
for Zope 2.  (In other words, I'm not the only person who likes being able 
to stack persistence mechanisms and "triggers".  Although I suspect it was 
my work with ZPatterns that initially led Steve down that dark path of 
corruption.  ;) )



>>* Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
>>way such that it works whether you have Interface or not", but the reality
>>is that a dependency in the standard library on something outside the
>>standard library is a big no-no, and just begging for breakage as soon as
>>there *is* an Interface package (with a new API) in the standard library.
>
>I think that this is a very bad idea. I think the interfaces clarify things
>quite a bit.

I think maybe I was unclear.  I certainly don't think that the interfaces 
should cease to exist, or that they should not exist as documentation.  I'm 
referring to their inclusion as operating code, only.


>>Whew!  I think that about covers it, as far as what I'd like to see, and
>>what I think would be needed to make it acceptable for the core.  Comments?
>>By the way, my rationale for not taking any radical new approaches to
>>persistence, observation, or notification in this proposal is that the
>>existing Persistence package is "transparent" enough, and has the benefit
>>of lots of field experience.  I spent a lot of time trying to come up with
>>"better" ways before writing this; mostly I found that trying to make it
>>more "transparent" to the object being persisted, just pushes the
>>complexity into either the app or the backend, without really helping
>>anything.  It's not a really big deal to:
>>1. Subclass Persistent
>>2. Use PersistentList and PersistentMapping or other Persistent objects for
>>your attributes, or set self._p_changed when you change a non-persistent
>>mutable.
>
>These are not a big deal to you, because you have a deep understanding and
>interest in the machinery. They are a big deal to most people. It would
>be *wonderful* if we could avoid this. Maybe if we had a standard persistence
>framework, we could motivate language changes that made this cleaner. :)

Interesting that you say this, considering how much adoption ZODB has had 
in the larger Python community.  Perhaps you could be more specific as to 
the audience you're talking about?

To get rid of these things is possible, but complex.  Getting rid of 
Persistent while minimizing loss of generality would mean either 
introducing proxies, or dynamically altering object types in order to get 
the observation capability.  I'm seriously unconvinced that adding a line 
to import Persistent, and adding a word to the definition of a few 
application base classes, is so burdensome as to be worth the complexity 
and fragility of either of the basic approaches to avoiding it!  (The 
second issue could probably be addressed with an extension of the solution 
to the first...  by adding further complexity.)

If our goal is to provide a Python core package for this in a speedy 
timeframe -- say this summer -- I think that developing and debugging a 
whole new way of doing things like this is probably out of the question.

Thing is, *we don't have to actually solve this problem*.  If we create a 
decent base API/implementation, there's no reason people can't create the 
proxies or class-substitution mechanisms on their own, using the base 
implementation to do the actual persistence part.  In principle, it should 
be possible to create such a mechanism for arbitrary data managers.

I should also mention, by the way, that PEAK (formerly TransWarp) has 
mechanisms that allow generation of class families with re-parented base 
classes, and re-written methods.  So that's just one of many possible means 
by which one could create a transparency mechanism, independent of the 
persistence framework or persistence mechanisms.  I don't think we should 
tie the persistence framework, therefore, to one specific transparency 
mechanism.  Especially since we don't know what transparency mechanisms 
will be "best" for a given situation.


>Transactions are a huge benefit, as opposed to something that is "not
>really a big deal". :)

I'm really surprised you get objections to adding a base class, but not to 
rewriting applications to use transactions.  Adding the latter actually 
seems more invasive a change, to me, even if the benefit is certainly 
noticed and appreciated.


>Here are some additional points:
>
>- While we should provide a standard implementation of a persistence
>   *interface*, we should allow other implementations. For example, the
>   data manager or cache should not depend on internal details of the
>   persistence implementation. We should not require a specific C layout
>   for persistent objects, for example.

Okay.


>- The persistence interface and implementations should be independent of
>   the cache implementations (e.g. no _p_atime). We *do* need to provide
>   an better API for handling objects that are unwilling to be deactivated.
>   Perhaps _p_deactivate should return a value indicating whether the object
>   was deactivated, and, if not, perhaps why.

Okay.


>- We need to define the state model for persistent objects. I'd like to 
>include
>   the notion of a persistent refcount. Possible states are:
>
>   o Unsaved
>
>   o Up to date
>
>   o changed
>
>   o ghost
>
>   In addition, there is a persistent reference count. This is used by C code
>   to indicate that the object is being used outside of Python. An objecty
>   can't be turned into a ghost if it's persistent reference count is > 0.
>   We'll model the reference count as a "sticky" state. We transition to 
> the sticky
>   state when the reference count becomes non-zero and from the sticky state
>   when the reference count drops to zero. This state is largely indepent 
> of the other
>   states.

Sounds good.


>- I'd like to spend some time thinking through persistence related events.
>   Here's a start:
>
>     o When a persistent object is modified while in the up-to-date state,
>       it should notify it's datata manager and transition to the changed 
> state.

Sure.



>     o When the object it accessed, it should notify it's data manager. 
> Perhaps it
>       should pass it's current state.

I'd like to rephrase that as being it notifies, *if* it has been requested 
to do so by the data manager.  The data manager may decide to turn on or 
off such notifications at will.  (In other words, I want my post-getattr 
hook function that can modify the result of the getattr, and I want it 
removable so I don't continue to pay in performance once all my state is 
loaded.)


>     o The persistent object calls a method on the data manager when it's 
> state
>       needs to be loaded.

As long as I still have the ability to set or remove a getattr-hook that 
works independently of this, I'm fine.


>     o The persistent object should probably notify the data manager of 
> any state
>       changes.

*Shrug*.  IAGNI.  (I ain't gonna need it. :)  I don't have a use case for 
any messages but "I'm changed", "load me", and "postprocess a getattr".

For what it's worth, I'd *really* like to keep this *simple*.  Simple to me 
means released sooner, more explicit, more reliable.  So I'd be happiest if 
we can stick to specific use cases.

I've spent a lot of time hacking around the existing packages to do 
SQL/LDAP stuff, and others here should have strong experience using ZODB 
for its "natural" backends and application structures.  That means we 
should be able to get pretty concrete about what is and isn't needed.  In 
the absence of more use cases, I'm not sure what else is really needed 
besides what we've already discussed.  Indeed, most of what I've outlined 
has been stuff I think should be taken *out*.

To put it another way, I think we should have to justify everything we want 
to put *in*, not what we take out.  Python standard library modules are 
widely distributed, and have a long life.  Whatever we put in needs to have 
a healthy life expectancy!



From jim@zope.com  Mon Jul 22 20:47:42 2002
From: jim@zope.com (Jim Fulton)
Date: Mon, 22 Jul 2002 15:47:42 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <Sebastien Bigaret's
	message of "15 Jul 2002 15:06:16 +0200">
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com>
Message-ID: <3D3C615E.6030201@zope.com>

Phillip J. Eby wrote:
> At 01:16 PM 7/22/02 -0400, Jim Fulton wrote:
> 
>> Phillip J. Eby wrote:
>>
>>> * Remove the BTrees subpackage, and the Class, Cache, Function, and 
>>> Module
>>> modules, along with the ICache interface.  Rationale: The BTrees 
>>> package is
>>> only useful for a relatively small subset of possible persistence 
>>> backends,
>>> and is subject to periodic data structure changes which affect 
>>> applications
>>> using it.
>>
>>
>> I'm OK with taking out BTrees, however, BTrees were included in ZODB by
>> very popular demand.
> 
> 
> And they should continue to be included with ZODB. 

They don't depend on ZODB in any way.

 > But IMHO their use
> is specific to persistence mechanisms which use "pickle jar"-style or 
> "shelve"-like primitive databases.  (Primitive in the sense of not 
> providing any concepts such as indexes or built-in search 
> capabilities.)  If you have a higher-level mechanism, even one as simple 
> as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off 
> using those features of the backend.

I don't agree.


> If this were not true, there'd be no need for any persistence mechanisms 
> besides ZODB, and we wouldn't be having this conversation.  :)

There are lots of other reasons for a non-ZODB persistent storage
including:

1) Need to store data in relational databases

    - Because they are trusted

    - because data needs to be accessed from other apps

    - because they may scale better for some apps

2) Competition is good. :)

> (Note that I'm assuming that ZODB itself will continue to exist as an 
> independent package, providing a persistence mechanism through its 
> Connection, Database, and Storage objects.  It just shouldn't need to 
> include Persistence or Transaction any more;

Of course.

 > BTrees would become
> ZODB.BTrees, or something similar.)

No, they would be separate. They don't depend on ZODB.


> 
>> You haven't given a rational for not including the caching framework.
>> The caching framework is closely ties to persistence and, I think,
>> largely independent of data managers.
> 
> 
> IMHO the existing caches are tied to a specific caching policy, which 
> embeds many ZODB-ish assumptions.  For RDBMS work, I primarily need 
> transactional caching, where caches are cleared between transactions.  
> For that, I can use a simple WeakValueDictionary, with some code that 
> deactivates objects between transactions.
> 
> But if you think we should throw in some basic cache implementations for 
> the most common caching policies, I've no objection.  I just thought it 
> better to save argument at the present time over *which* caching 
> policies would be most common.  :)

I think that there should, at least, be a standard cache interface.
It should be possible to develop data managers and caches independently.
Maybe we could include one or two standard implementations. These could
provide useful examples for other implementations and, of course, be
useful in themselves.


...

>>> * Take out the interfaces.  :(  I'd rather this were, "leave this in, 
>>> in a
>>> way such that it works whether you have Interface or not", but the 
>>> reality
>>> is that a dependency in the standard library on something outside the
>>> standard library is a big no-no, and just begging for breakage as 
>>> soon as
>>> there *is* an Interface package (with a new API) in the standard 
>>> library.
>>
>>
>> I think that this is a very bad idea. I think the interfaces clarify 
>> things
>> quite a bit.
> 
> 
> I think maybe I was unclear.  I certainly don't think that the 
> interfaces should cease to exist, or that they should not exist as 
> documentation.  I'm referring to their inclusion as operating code, only.

So you don't want them to get imported?


...

>> These are not a big deal to you, because you have a deep understanding 
>> and
>> interest in the machinery. They are a big deal to most people. It would
>> be *wonderful* if we could avoid this. Maybe if we had a standard 
>> persistence
>> framework, we could motivate language changes that made this cleaner. :)
> 
> 
> Interesting that you say this, considering how much adoption ZODB has 
> had in the larger Python community.  Perhaps you could be more specific 
> as to the audience you're talking about?

I was mainly refering to the handling of non-persistent mutable
stumbling block. This is a major stubling block and source of errors
to most ZODB users.

Having to mix in persistence is an annoyance. It would be really
cool (but hard, very hard) to get rid of them.


> To get rid of these things is possible, but complex.  Getting rid of 
> Persistent while minimizing loss of generality would mean either 
> introducing proxies, or dynamically altering object types in order to 
> get the observation capability.  I'm seriously unconvinced that adding a 
> line to import Persistent, and adding a word to the definition of a few 
> application base classes, is so burdensome as to be worth the complexity 
> and fragility of either of the basic approaches to avoiding it!  (The 
> second issue could probably be addressed with an extension of the 
> solution to the first...  by adding further complexity.)

I agree that this is hard. It's really hard. I wasn't even suggesting
that we needed to solve this problem. I was merely pointing out that this
*is* a big deal for a lot of people.


> If our goal is to provide a Python core package for this in a speedy 
> timeframe -- say this summer -- I think that developing and debugging a 
> whole new way of doing things like this is probably out of the question.

Agreed. OTOH, it wouldn't hurt to ponder other alternatives, if not now,
them maybe later.

> Thing is, *we don't have to actually solve this problem*.  If we create 
> a decent base API/implementation, there's no reason people can't create 
> the proxies or class-substitution mechanisms on their own, using the 
> base implementation to do the actual persistence part.  In principle, it 
> should be possible to create such a mechanism for arbitrary data managers.

True. But maybe someone will think of a way to solve this without proxies
or alchemy?


...


>>     o When the object it accessed, it should notify it's data manager. 
>> Perhaps it
>>       should pass it's current state.
> 
> 
> I'd like to rephrase that as being it notifies, *if* it has been 
> requested to do so by the data manager.  The data manager may decide to 
> turn on or off such notifications at will.  (In other words, I want my 
> post-getattr hook function that can modify the result of the getattr, 
> and I want it removable so I don't continue to pay in performance once 
> all my state is loaded.)

We need to think some more about this. I'd rather err on the side of
simple persistent objects and complex data managers.

I'd also like persistent objects to be as lightweight as possible.
Carrying a bunch of attributes for hooks is worrysome/


> 
>>     o The persistent object calls a method on the data manager when 
>> it's state
>>       needs to be loaded.
> 
> 
> As long as I still have the ability to set or remove a getattr-hook that 
> works independently of this, I'm fine.

Would different objects in the same DM have different values of the same hook?
If so, why?


>>     o The persistent object should probably notify the data manager of 
>> any state
>>       changes.
> 
> 
> *Shrug*.  IAGNI.  (I ain't gonna need it. :)  I don't have a use case 
> for any messages but "I'm changed", "load me", and "postprocess a getattr".
> 
> For what it's worth, I'd *really* like to keep this *simple*.  Simple to 
> me means released sooner, more explicit, more reliable.  So I'd be 
> happiest if we can stick to specific use cases.

A decent cache is going to handle objects differenty based on their states.
For example, a cache that deactivates objects when they haven't been used in a
while needs to know which objects are ghostifyable and needs to know when
ghostifyable objects have changed.

> I've spent a lot of time hacking around the existing packages to do 
> SQL/LDAP stuff, and others here should have strong experience using ZODB 
> for its "natural" backends and application structures.  That means we 
> should be able to get pretty concrete about what is and isn't needed.  
> In the absence of more use cases, I'm not sure what else is really 
> needed besides what we've already discussed.  Indeed, most of what I've 
> outlined has been stuff I think should be taken *out*.
> 
> To put it another way, I think we should have to justify everything we 
> want to put *in*, not what we take out.  Python standard library modules 
> are widely distributed, and have a long life.  Whatever we put in needs 
> to have a healthy life expectancy!

I don't think we should approach this effort with the assumption that the first
version is going into the standard library. I'm pretty happy with the persistence
mechanism I came up with for ZODB, but there are a lot of things I'd like to fix.

I agree that we should be rather conservative, but this is a good time to fix things.
Having dome so, we should get some experience with what we've come up with before
we worry about adding it to the standard library.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From jeremy@zope.com  Mon Jul 22 21:03:11 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Mon, 22 Jul 2002 16:03:11 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<200207192003.g6JK3sw14911@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15676.25855.718959.288651@localhost.localdomain>

>>>>> "GvR" == Guido van Rossum <guido@python.org> writes:

  GvR> I've often thought that it's ugly that you have to set _p_state
  GvR> and _p_changed, rather than do these things with method calls.
  GvR> What do you think about that?  Especially the conventions for
  GvR> _p_state look confusing to me.

I would like to keep a simplified version of _p_changed, but not
_p_state.  The purpose of assignment to _p_changed is to mark an
object as changed.  Assignment seems clear here.  _p_changed is a
flag, normally false; when an object is changed, it is set to true.
Why would a method call be any clearer?

In general, it seems Python programs often use instance variables in
this way, and the property mechanism only makes it easier for
something like looks like assignment to behave in special ways.

I don't think there's any need to make _p_state part of the documented
API, although it may be useful to keep for debugging.

Jeremy



From jeremy@zope.com  Mon Jul 22 21:12:50 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Mon, 22 Jul 2002 16:12:50 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3.0.5.32.20020719115207.0086e100@telecommunity.com>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
	<Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
Message-ID: <15676.26434.240121.243006@localhost.localdomain>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  PJE> * Flag _p_changed *after* __setattr__, not before!  This will
  PJE> help co-operative transaction participants play nicely
  PJE> together, since they can't "write through" a change if they're
  PJE> getting notified *before* the change takes place!  Docs should
  PJE> also clarify that when set in other code, _p_changed should be
  PJE> set at the latest possible moment, *after* the object is in its
  PJE> new, stable state.

Can you flesh out this request?  The second sentence there suggests
interesting issues, but doesn't spell them out.  

As for when _p_changed should be set: Why does it matter?

  PJE> * Keep the _p_atime slot, but don't fill it with anything by
  PJE> default.

I'd just as soon drop it completely.  If a particular application
wants to extend the base persistence interface, it can.

  PJE> * Get rid of the term "register", since objects won't
  PJE> "register" with the transaction, and neither should they with
  PJE> their data manager.  They should "inform their data manager"
  PJE> that they have changed.  Something like an objectChanged()
  PJE> message is appropriate in place of register().  I believe this
  PJE> would clarify the API.

I don't have a problem with register().  In what way is the current
interface unclear?

  PJE> By the way, my rationale for not taking any radical new
  PJE> approaches to persistence, observation, or notification in this
  PJE> proposal is that the existing Persistence package is
  PJE> "transparent" enough, and has the benefit of lots of field
  PJE> experience. 

I'd like to see some comments from people who haven't already used
ZODB.  I worry that all the comments are coming from a small number of
people who wrote or use ZODB's persistent mechanism, and that we'll
make decisions will be limiting for other persistent applications.
(But maybe there aren't any other such applications/users.)

Jeremy



From tim@zope.com  Mon Jul 22 21:43:25 2002
From: tim@zope.com (Tim Peters)
Date: Mon, 22 Jul 2002 16:43:25 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.25855.718959.288651@localhost.localdomain>
Message-ID: <LNBBLJKPBEHFEDALKOLCEEKLAGAB.tim@zope.com>

[Guido]
> I've often thought that it's ugly that you have to set _p_state
> and _p_changed, rather than do these things with method calls.
> What do you think about that?  Especially the conventions for
> _p_state look confusing to me.

[Jeremy Hylton]
> I would like to keep a simplified version of _p_changed,

If _p_changed is a 1-bit flag now, how much simpler can it get <wink>?

> but not _p_state.  The purpose of assignment to _p_changed is to mark an
> object as changed.  Assignment seems clear here.  _p_changed is a
> flag, normally false; when an object is changed, it is set to true.
> Why would a method call be any clearer?

Presumably so that interested parties could influence what happens when an
object becomes "dirty"?  Maybe update a distributed cache, who knows.  I
suspect Philip Eby was getting at something related with his plea to set
_p_changed only after an object is an a sane state again after a change is
complete.

OTOH, method calls are a large overhead whem the mutation is simple; e.g.,
if a persistent list has to call a changed() method every time someone does

    a[i] = 6

that's a real drag on potential performance.



From jeremy@alum.mit.edu  Mon Jul 22 21:49:51 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 22 Jul 2002 16:49:51 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEKLAGAB.tim@zope.com>
References: <15676.25855.718959.288651@localhost.localdomain>
	<LNBBLJKPBEHFEDALKOLCEEKLAGAB.tim@zope.com>
Message-ID: <15676.28655.414460.130631@slothrop.zope.com>

>>>>> "TP" == Tim Peters <tim@zope.com> writes:

  TP> [Guido]
  >> I've often thought that it's ugly that you have to set _p_state
  >> and _p_changed, rather than do these things with method calls.
  >> What do you think about that?  Especially the conventions for
  >> _p_state look confusing to me.

  TP> [Jeremy Hylton]
  >> I would like to keep a simplified version of _p_changed,

  TP> If _p_changed is a 1-bit flag now, how much simpler can it get
  TP> <wink>?

It's not a one-bit flag, and that's the part I want to simplify.  You
can also:

 - set _p_changed to None, which requests that the object become a
   ghost.

 - delete the _p_changed attribute (del obj._p_changed) which also
   asks the object to become a ghost, but in subtly different ways
   than just setting the attribute to None.

 - revive a ghost, although I'm not entirely clear how this work.

The Zope3 persistence mechanism supports all the _p_changed magic, but
also exports _p_activate() and _p_deactivate().  The first makes a
ghost a real object, the second makes a real object a ghost.

Jeremy



From jeremy@zope.com  Mon Jul 22 22:24:08 2002
From: jeremy@zope.com (Jeremy Hylton)
Date: Mon, 22 Jul 2002 17:24:08 -0400 (EDT)
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <15676.30712.483484.650388@localhost.localdomain>

I would like to see arbitrarily nested transactions supported in the
next generation transaction API.

Jeremy



From pje@telecommunity.com  Tue Jul 23 02:12:20 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 22 Jul 2002 21:12:20 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3D3C615E.6030201@zope.com>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <3.0.5.32.20020719115207.0086e100@telecommunity.com>
 <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com>

At 03:47 PM 7/22/02 -0400, Jim Fulton wrote:
>Phillip J. Eby wrote:
>
> > But IMHO their use
>>is specific to persistence mechanisms which use "pickle jar"-style or 
>>"shelve"-like primitive databases.  (Primitive in the sense of not 
>>providing any concepts such as indexes or built-in search 
>>capabilities.)  If you have a higher-level mechanism, even one as simple 
>>as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off 
>>using those features of the backend.
>
>I don't agree.

I didn't qualify my statement sufficiently, then.  :)  See below.


>>If this were not true, there'd be no need for any persistence mechanisms 
>>besides ZODB, and we wouldn't be having this conversation.  :)
>
>There are lots of other reasons for a non-ZODB persistent storage
>including:
>
>1) Need to store data in relational databases
>
>    - Because they are trusted
>
>    - because data needs to be accessed from other apps
>
>    - because they may scale better for some apps

Right, and if you're doing it because of the second or third sub-item 
above, you will have little use for BTrees.  AFAICT, the only reason one 
would store a BTree in another BTree would be if you're doing ZODB-type 
things in an SQL db "because they are trusted".

This is part of what I meant by "most often better off using those 
[higher-level] features of the back-end."  Applications which have 
different read/write characteristics and structural/performance 
requirements than content management applications, will generally be *much* 
better off leaving these things to a good back-end, than managing BTrees 
themselves.


>I think that there should, at least, be a standard cache interface.
>It should be possible to develop data managers and caches independently.
>Maybe we could include one or two standard implementations. These could
>provide useful examples for other implementations and, of course, be
>useful in themselves.

Sure.  I personally don't think there's much that you can standardize on in 
a caching API besides which mapping methods one is required to support, 
without getting into policy and use cases.  But I'm probably biased by the 
relative simplicity of my own use cases re: caching, and by my intense 
desire to get an "official" persistence base into the standard library, at 
the expense of any actual persistence *mechanisms* if need be.  I'm going 
to have to write my own mechanism anyway, so again I'm biased.  :)


>>>>* Take out the interfaces.  :(  I'd rather this were, "leave this in, in a
>>>>way such that it works whether you have Interface or not", but the reality
>>>>is that a dependency in the standard library on something outside the
>>>>standard library is a big no-no, and just begging for breakage as soon as
>>>>there *is* an Interface package (with a new API) in the standard library.
>>>
>>>I think that this is a very bad idea. I think the interfaces clarify things
>>>quite a bit.
>>
>>I think maybe I was unclear.  I certainly don't think that the interfaces 
>>should cease to exist, or that they should not exist as 
>>documentation.  I'm referring to their inclusion as operating code, only.
>
>So you don't want them to get imported?

It's not that I care one way or the other.  Honestly, I'd rather see 
Interface end up in the standard library too - at least once the metaclass 
bug is fixed.  :)  But my overriding priority here is a standard for 
Persistence and Transaction bases for eventual inclusion in the standard 
library.

I have many projects which desperately need good persistence and 
transaction frameworks, but I'm between a rock (ZODB 3) and a hard place 
(ZODB 4) right now.  Both have transaction API's that are somewhat 
difficult to work with, and I need some of the things that are in ZODB 4, 
but if ZODB 4 is about to be re-factored...  I'm stuck in the middle with 
code that could end up orphaned.  Even if I go off and write everything I 
need "from scratch" in order to dodge out this dependency, it doesn't help 
me if the eventual standard doesn't match up closely enough with my 
work.  I'm still left with "orphaned" code - sort of like a DB connection 
object created prior to adoption of a DBAPI standard.

Thus, my objective is to keep the shortest possible distance between me and 
a Python community consensus on a base-level transaction and persistence 
API.  I have a fairly limited time window, however, before I will have to 
pick something and do something, regardless of the long-term cost.  :(


>I was mainly refering to the handling of non-persistent mutable
>stumbling block. This is a major stubling block and source of errors
>to most ZODB users.

Yeah, that one really requires metadata, or collaborative properties.  But 
those are things that are also already in PEAK, so again I'm probably 
biased as to how difficult/available they are.

Also, in the SQL world, the solution to non-persistent mutable data is 
actually quite trivial: don't have non-persistent mutable 
data.  :)  Seriously, since a data manager loads an object's state, it can 
*guarantee* that there will be no non-persistent mutable attributes.  (Note 
that if the object replaces a persistent mutable with a non-persistent one, 
that will trigger a change, and the data manager can force it back to a 
persistent mutable when the state goes back to "up to date".)

In the SQL world, a data manager *must* have this sort of schema knowledge 
in order to do its job.  Pickle-driven data managers may have a harder time 
of this, of course, if they lack sufficient schema knowledge to manage 
object state in this fashion.

Then again, perhaps we could solve the problem for pickle-driven databases 
as well, if there were a Python protocol for declaring immutability!  Heck, 
in theory, one could use interface adaptation to transform objects like 
lists into persistent equivalents.  It would only be necessary to do this, 
however, if the object whose state was being loaded didn't declare that it 
handled its own persistence properly.

The performance/space issue of saving extra persistent objects could 
actually be dealt with by having the substituted objects implement only 
observation on behalf of their holder(s), rather than being actual 
persistent objects.


>I agree that this is hard. It's really hard. I wasn't even suggesting
>that we needed to solve this problem. I was merely pointing out that this
>*is* a big deal for a lot of people.

Understood.

Ironically, enough, I think I have stumbled onto another mechanism for 
doing so, above.

Newly created objects and their subobjects won't be observed, of course, 
but that's moot since they have to be referenced from another persistent 
object to get saved at all.  In "rootless" persistence mechanisms (such as 
most SQL databases), the data manager has to explicitly add the object anyhow.

So it seems that all that's needed is sufficient introspection capability 
to distinguish between:

* A persistent object
* An immutable
* An "observed" mutable
* An "unobserved" mutable

With the ability to substitute a suitable observed mutable for an 
unobserved one, when state is loaded or saved.

I'm going to think about this some more...  It seems altogether too easy, 
so I'm sure there's something I'm missing.  Most likely, it's just that the 
devil is in the details...  the specific issues of introspection, 
selection, and substitution are likely to have lots of little gotchas.


>>If our goal is to provide a Python core package for this in a speedy 
>>timeframe -- say this summer -- I think that developing and debugging a 
>>whole new way of doing things like this is probably out of the question.
>
>Agreed. OTOH, it wouldn't hurt to ponder other alternatives, if not now,
>them maybe later.

I admit I do enjoy trying to solve the problem.  I'm just not optimistic 
about finding a simple solution.  :)


>>Thing is, *we don't have to actually solve this problem*.  If we create a 
>>decent base API/implementation, there's no reason people can't create the 
>>proxies or class-substitution mechanisms on their own, using the base 
>>implementation to do the actual persistence part.  In principle, it 
>>should be possible to create such a mechanism for arbitrary data managers.
>
>True. But maybe someone will think of a way to solve this without proxies
>or alchemy?

Unless you're going to fundamentally alter the Python object model, it's 
not doable.  Python objects by definition get their behavior from their 
type.  To change the behavior, you must either change the type, the type 
pointer in the object, or replace the object with another one.



>>I'd like to rephrase that as being it notifies, *if* it has been 
>>requested to do so by the data manager.  The data manager may decide to 
>>turn on or off such notifications at will.  (In other words, I want my 
>>post-getattr hook function that can modify the result of the getattr, and 
>>I want it removable so I don't continue to pay in performance once all my 
>>state is loaded.)
>
>We need to think some more about this. I'd rather err on the side of
>simple persistent objects and complex data managers.

So would I, which is why I want the hook, so the data manager can provide 
the behavior, rather than building it into the object.  :)


>I'd also like persistent objects to be as lightweight as possible.
>Carrying a bunch of attributes for hooks is worrysome/

Hm.  Well, we're talking C-level slots here, and I only asked for one hook, 
myself.  Guido suggested the setattr hook.  :)  I like lightweight in 
*performance*, and having a callable C function seems lighter in that sense 
than having the object look up an attribute on the data manager every time 
an attribute lookup is performed on it.  Plus, the hook can be stateful, 
while a method on the data manager has to check state - which could require 
a re-entrant attribute lookup back to the object.


>>>     o The persistent object calls a method on the data manager when 
>>> it's state
>>>       needs to be loaded.
>>
>>As long as I still have the ability to set or remove a getattr-hook that 
>>works independently of this, I'm fine.
>
>Would different objects in the same DM have different values of the same hook?

Different values, yes.  Different non-empty values, probably not.  In other 
words, I'm mainly interested in having the hook be "on" or "off" for a 
given data manager.


>If so, why?

I have only one use case for having different non-empty hook values for the 
same DM: polymorphism.    But there are other ways to achieve it, so I 
don't think different non-empty values per DM is a requirement.  I suppose 
you could then implement the hook as a bit flag rather than a hook pointer, 
but it seems to me the performance might be worth using a pointer instead 
of a bit flag.


>A decent cache is going to handle objects differenty based on their states.
>For example, a cache that deactivates objects when they haven't been used in a
>while needs to know which objects are ghostifyable and needs to know when
>ghostifyable objects have changed.

So add "sticky"/"unsticky" messages, and we'd be done.  Or, if "stickiness" 
represents a minority state among ghostable objects, don't even add this, 
because it'd be more efficient for the cache to just ask the object to 
deactivate itself and see what happens, than to send lots of "I'm 
sticky...  whoops, now I'm not" messages to data managers.

With the messages I listed previously, a data manager should have enough 
information.  I'd rather we try to implement some data managers or caches 
and find we need to add something, than add a YAGNI on this one, because 
the performance penalty for unnecessary notifications seems potentially 
high, not to mention the added complexity for data managers to handle a 
bunch of extra messages.


>>I've spent a lot of time hacking around the existing packages to do 
>>SQL/LDAP stuff, and others here should have strong experience using ZODB 
>>for its "natural" backends and application structures.  That means we 
>>should be able to get pretty concrete about what is and isn't needed.
>>In the absence of more use cases, I'm not sure what else is really needed 
>>besides what we've already discussed.  Indeed, most of what I've outlined 
>>has been stuff I think should be taken *out*.
>>To put it another way, I think we should have to justify everything we 
>>want to put *in*, not what we take out.  Python standard library modules 
>>are widely distributed, and have a long life.  Whatever we put in needs 
>>to have a healthy life expectancy!
>
>I don't think we should approach this effort with the assumption that the 
>first
>version is going into the standard library. I'm pretty happy with the 
>persistence
>mechanism I came up with for ZODB, but there are a lot of things I'd like 
>to fix.

As I mentioned above, my primary goal is just to get a consensus for the 
basic interfaces.  I'd be happy if we end up with something like a DBAPI 
PEP that everybody agreed on.  The standard library is gravy, but I *do* 
want to see it there before too terribly long.  IOW, I'd like this to be 
like the XML processing and distutils, which were separately distributed 
for a (Python) release or two as candidates for the standard library, and 
became standard later.



From pje@telecommunity.com  Tue Jul 23 02:29:23 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 22 Jul 2002 21:29:23 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.26434.240121.243006@localhost.localdomain>
References: <3.0.5.32.20020719115207.0086e100@telecommunity.com>
 <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <3.0.5.32.20020719115207.0086e100@telecommunity.com>
Message-ID: <5.1.0.14.0.20020722211245.0520c020@mail.telecommunity.com>

At 04:12 PM 7/22/02 -0400, Jeremy Hylton wrote:
> >>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
>
>   PJE> * Flag _p_changed *after* __setattr__, not before!  This will
>   PJE> help co-operative transaction participants play nicely
>   PJE> together, since they can't "write through" a change if they're
>   PJE> getting notified *before* the change takes place!  Docs should
>   PJE> also clarify that when set in other code, _p_changed should be
>   PJE> set at the latest possible moment, *after* the object is in its
>   PJE> new, stable state.
>
>Can you flesh out this request?  The second sentence there suggests
>interesting issues, but doesn't spell them out.
>
>As for when _p_changed should be set: Why does it matter?

Because setting _p_changed triggers a notification to the DM, which may 
need to perform an immediate save of the object's state, if a transaction 
commit is already in progress.


>   PJE> * Get rid of the term "register", since objects won't
>   PJE> "register" with the transaction, and neither should they with
>   PJE> their data manager.  They should "inform their data manager"
>   PJE> that they have changed.  Something like an objectChanged()
>   PJE> message is appropriate in place of register().  I believe this
>   PJE> would clarify the API.
>
>I don't have a problem with register().  In what way is the current
>interface unclear?

"register" doesn't mean anything in the context of a data manager.  It made 
some sense in reference to a transaction - presumably something registering 
with a transaction is some sort of transacted thing.  Registering with a 
data manager, however, doesn't say anything about what it's being 
registered for or what this will do.  "objectChanged()", however, would 
clearly state that this is a notice that an object has been changed.

Also, "register" implies implementation that may not exist!  Some data 
managers may save changes immediately, and not "register" anything about 
the object or the change.  (Think, for example, of a chat room object 
implemented via a persistence mechanism.)


>I'd like to see some comments from people who haven't already used
>ZODB.

I'd like to see some, too!  If it was just going to be Jim and me we 
could've taken it to private e-mail and avoided having a SIG.  :)


>   I worry that all the comments are coming from a small number of
>people who wrote or use ZODB's persistent mechanism, and that we'll
>make decisions will be limiting for other persistent applications.
>(But maybe there aren't any other such applications/users.)

Personally, I'm trying to speak as someone who has *wrestled* with ZODB, 
trying to make it do things it's not entirely suited for.  As pro-ZODB as I 
may sound in some ways, my needs are pretty diametrically opposite a *lot* 
of ZODB's design parameters.  I want:

* Transparent use of legacy databases w/fixed schemas (vs. new DB format, 
fluid schema)

* Strongly transactional caching (vs. out-of-date reads of objects not 
written in that txn)

* High write-to-read ratio (vs. high read-to-write ratio)

* Use DBMS indexing and query capabilities (vs. creating them "from scratch")

* Undo and versions optionally handled at the application domain level (vs. 
building them into the infrastructure)

If you can think of a ZODB design parameter that I *don't* want the 
opposite of (besides things like lightweight, high performance, low memory, 
easy to use, etc., that nobody in their right mind would disagree with), 
please let me know.  :)

So, given that I'm so "opposite" in my needs, I think it's really quite 
impressive that it can accomodate me with so little stretching.  There 
isn't anything I've proposed in Straw Man and Straw Baby that I can't do 
with the existing ZODB 4 code, if I'm willing to hack at it a bit.  Okay, 
maybe more than a bit.  But it's at least *possible*.

Honestly, I was and am much more disturbed by the possibility of ZODB 4 
undergoing an API upheaval, than I am about not getting every little thing 
I want.  A de-facto "standard" framework that I can work around, is better 
for me than a fast-moving target that might someday meet my needs 
better.  Practicality beats purity, and all that.  :)



From pje@telecommunity.com  Tue Jul 23 02:32:50 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 22 Jul 2002 21:32:50 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <LNBBLJKPBEHFEDALKOLCEEKLAGAB.tim@zope.com>
References: <15676.25855.718959.288651@localhost.localdomain>
Message-ID: <5.1.0.14.0.20020722212950.0520d600@mail.telecommunity.com>

At 04:43 PM 7/22/02 -0400, Tim Peters wrote:
> > but not _p_state.  The purpose of assignment to _p_changed is to mark an
> > object as changed.  Assignment seems clear here.  _p_changed is a
> > flag, normally false; when an object is changed, it is set to true.
> > Why would a method call be any clearer?
>
>Presumably so that interested parties could influence what happens when an
>object becomes "dirty"?  Maybe update a distributed cache, who knows.  I
>suspect Philip Eby was getting at something related with his plea to set
>_p_changed only after an object is an a sane state again after a change is
>complete.

Yes, that's precisely it.  Updating a distributed cache would be another 
example of a "write-through changes" situation.


>OTOH, method calls are a large overhead whem the mutation is simple; e.g.,
>if a persistent list has to call a changed() method every time someone does
>
>     a[i] = 6
>
>that's a real drag on potential performance.

Right, which is why the existing self._p_changed = 1 thing isn't too bad as 
long as the descriptor is in C, and only calls through to the DM when the 
object transitions from up-to-date to "dirty".  Of course, write-through 
DM's will immediately reset the state to "up-to-date", but if they want to 
get called on every change, that's the price they'll pay.



From pje@telecommunity.com  Tue Jul 23 02:34:53 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 22 Jul 2002 21:34:53 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.28655.414460.130631@slothrop.zope.com>
References: <LNBBLJKPBEHFEDALKOLCEEKLAGAB.tim@zope.com>
 <15676.25855.718959.288651@localhost.localdomain>
 <LNBBLJKPBEHFEDALKOLCEEKLAGAB.tim@zope.com>
Message-ID: <5.1.0.14.0.20020722213305.0520e0f0@mail.telecommunity.com>

At 04:49 PM 7/22/02 -0400, Jeremy Hylton wrote:
> >>>>> "TP" == Tim Peters <tim@zope.com> writes:
>
>It's not a one-bit flag, and that's the part I want to simplify.  You
>can also:
>
>  - set _p_changed to None, which requests that the object become a
>    ghost.
>
>  - delete the _p_changed attribute (del obj._p_changed) which also
>    asks the object to become a ghost, but in subtly different ways
>    than just setting the attribute to None.
>
>  - revive a ghost, although I'm not entirely clear how this work.
>
>The Zope3 persistence mechanism supports all the _p_changed magic, but
>also exports _p_activate() and _p_deactivate().  The first makes a
>ghost a real object, the second makes a real object a ghost.

I'd be happy sticking with the methods for activate/deactivate, and 
simplifying _p_changed to a one-bit flag.  The change flag wants to be as 
lightweight as possible.  And we can even make it a boolean, to make Guido 
happy.  ;)



From pje@telecommunity.com  Tue Jul 23 02:37:27 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 22 Jul 2002 21:37:27 -0400
Subject: [Persistence-sig] Nested Transactions
In-Reply-To: <15676.30712.483484.650388@localhost.localdomain>
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20020722213513.05099560@mail.telecommunity.com>

At 05:24 PM 7/22/02 -0400, Jeremy Hylton wrote:
>I would like to see arbitrarily nested transactions supported in the
>next generation transaction API.

Could you add some more specifics?  For example, what happens if a 
transaction participant can't support nested transactions?  I gather that 
this capability is not exactly common, even among SQL databases.



From jeremy@alum.mit.edu  Tue Jul 23 04:17:41 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 22 Jul 2002 23:17:41 -0400
Subject: [Persistence-sig] Nested Transactions
In-Reply-To: <5.1.0.14.0.20020722213513.05099560@mail.telecommunity.com>
References: <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<5.1.0.14.0.20020722213513.05099560@mail.telecommunity.com>
Message-ID: <15676.51925.551981.560829@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  PJE> At 05:24 PM 7/22/02 -0400, Jeremy Hylton wrote:
  >> I would like to see arbitrarily nested transactions supported in
  >> the next generation transaction API.

  PJE> Could you add some more specifics?  For example, what happens
  PJE> if a transaction participant can't support nested transactions?
  PJE> I gather that this capability is not exactly common, even among
  PJE> SQL databases.

I gather you want more specifics about the API, which I'll post as
soon as I work them out :-).  The general idea is clear enough, I
think.  A larger transaction can be composed of several components
transactions, each of which is an atomic action.  This approach can be
applied recursively.

If the backend doesn't support it, then the application gets an
exception when I tries to use the feature.  The APIs should support
it, though, so that applications can be written against it.

I didn't think nested transactions were that uncommon, BTW.  At least
some of the EJB servers support them.

The two interface ideas I have are a savepoint() method or a way to
create a new transaction and specify a parent.  A savepoint() method
would return an object with a rollback() method.  When you call
savepoint(), you commit a subtransaction.  If you call rollback(), you
roll back changes to the savepoint.  I think this API can support
arbitrarily nested transactions, although the application will have to
work to manage the savepoint objects.

The other option seems simpler for nesting, because you explicitly
begin a new transaction for each atomic subaction.  The problem with
it is that the current transaction API doesn't have an explicit begin
phase.  (Is there a common pattern for RDBMS?  I've used some that do
an implicit BEGIN WORK and some that require an explicit one.)

Jeremy




From iiourov@yahoo.com  Tue Jul 23 08:13:11 2002
From: iiourov@yahoo.com (Ilia Iourovitski)
Date: Tue, 23 Jul 2002 00:13:11 -0700 (PDT)
Subject: [Persistence-sig] Nested Transactions
In-Reply-To: <15676.51925.551981.560829@slothrop.zope.com>
Message-ID: <20020723071311.39096.qmail@web20707.mail.yahoo.com>

In odmg style API transaction should be started
explicitely.

In RDBMS world user can explicitly enable/disable
transaction control
typically through
connection->setAutoCommit(false)

Ilia

--- Jeremy Hylton <jeremy@alum.mit.edu> wrote:
> >>>>> "PJE" == Phillip J Eby <pje@telecommunity.com>
> writes:
> 
>   PJE> At 05:24 PM 7/22/02 -0400, Jeremy Hylton
> wrote:
>   >> I would like to see arbitrarily nested
> transactions supported in
>   >> the next generation transaction API.
> 
>   PJE> Could you add some more specifics?  For
> example, what happens
>   PJE> if a transaction participant can't support
> nested transactions?
>   PJE> I gather that this capability is not exactly
> common, even among
>   PJE> SQL databases.
> 
> I gather you want more specifics about the API,
> which I'll post as
> soon as I work them out :-).  The general idea is
> clear enough, I
> think.  A larger transaction can be composed of
> several components
> transactions, each of which is an atomic action. 
> This approach can be
> applied recursively.
> 
> If the backend doesn't support it, then the
> application gets an
> exception when I tries to use the feature.  The APIs
> should support
> it, though, so that applications can be written
> against it.
> 
> I didn't think nested transactions were that
> uncommon, BTW.  At least
> some of the EJB servers support them.
> 
> The two interface ideas I have are a savepoint()
> method or a way to
> create a new transaction and specify a parent.  A
> savepoint() method
> would return an object with a rollback() method. 
> When you call
> savepoint(), you commit a subtransaction.  If you
> call rollback(), you
> roll back changes to the savepoint.  I think this
> API can support
> arbitrarily nested transactions, although the
> application will have to
> work to manage the savepoint objects.
> 
> The other option seems simpler for nesting, because
> you explicitly
> begin a new transaction for each atomic subaction. 
> The problem with
> it is that the current transaction API doesn't have
> an explicit begin
> phase.  (Is there a common pattern for RDBMS?  I've
> used some that do
> an implicit BEGIN WORK and some that require an
> explicit one.)
> 
> Jeremy
> 
> 
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
>
http://mail.python.org/mailman-21/listinfo/persistence-sig


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com


From iiourov@yahoo.com  Tue Jul 23 08:22:44 2002
From: iiourov@yahoo.com (Ilia Iourovitski)
Date: Tue, 23 Jul 2002 00:22:44 -0700 (PDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.26434.240121.243006@localhost.localdomain>
Message-ID: <20020723072244.83368.qmail@web20704.mail.yahoo.com>

For RDBMS based storages api should
provides the following method:

create(object) storage shall populated id from rdbms
which is usually primary key.
delete(object) 
load(object type, object id)->object
query(string, parameters)->list of objects or smart
collection

Those methods can be placed in
Persistence/IPersistentDataManager.py

Thanks

Ilia



--- Jeremy Hylton <jeremy@zope.com> wrote:
> >>>>> "PJE" == Phillip J Eby <pje@telecommunity.com>
> writes:
> 
>   PJE> * Flag _p_changed *after* __setattr__, not
> before!  This will
>   PJE> help co-operative transaction participants
> play nicely
>   PJE> together, since they can't "write through" a
> change if they're
>   PJE> getting notified *before* the change takes
> place!  Docs should
>   PJE> also clarify that when set in other code,
> _p_changed should be
>   PJE> set at the latest possible moment, *after*
> the object is in its
>   PJE> new, stable state.
> 
> Can you flesh out this request?  The second sentence
> there suggests
> interesting issues, but doesn't spell them out.  
> 
> As for when _p_changed should be set: Why does it
> matter?
> 
>   PJE> * Keep the _p_atime slot, but don't fill it
> with anything by
>   PJE> default.
> 
> I'd just as soon drop it completely.  If a
> particular application
> wants to extend the base persistence interface, it
> can.
> 
>   PJE> * Get rid of the term "register", since
> objects won't
>   PJE> "register" with the transaction, and neither
> should they with
>   PJE> their data manager.  They should "inform
> their data manager"
>   PJE> that they have changed.  Something like an
> objectChanged()
>   PJE> message is appropriate in place of
> register().  I believe this
>   PJE> would clarify the API.
> 
> I don't have a problem with register().  In what way
> is the current
> interface unclear?
> 
>   PJE> By the way, my rationale for not taking any
> radical new
>   PJE> approaches to persistence, observation, or
> notification in this
>   PJE> proposal is that the existing Persistence
> package is
>   PJE> "transparent" enough, and has the benefit of
> lots of field
>   PJE> experience. 
> 
> I'd like to see some comments from people who
> haven't already used
> ZODB.  I worry that all the comments are coming from
> a small number of
> people who wrote or use ZODB's persistent mechanism,
> and that we'll
> make decisions will be limiting for other persistent
> applications.
> (But maybe there aren't any other such
> applications/users.)
> 
> Jeremy
> 
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
>
http://mail.python.org/mailman-21/listinfo/persistence-sig


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com


From pyth@devel.trillke.net  Tue Jul 23 08:32:42 2002
From: pyth@devel.trillke.net (holger krekel)
Date: Tue, 23 Jul 2002 09:32:42 +0200
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <15676.26434.240121.243006@localhost.localdomain>;
	from jeremy@zope.com on Mon, Jul 22, 2002 at 04:12:50PM -0400
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <Sebastien
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<15676.26434.240121.243006@localhost.localdomain>
Message-ID: <20020723093242.E10625@prim.han.de>

Jeremy Hylton wrote:
> I'd like to see some comments from people who haven't already used
> ZODB.  I worry that all the comments are coming from a small number of
> people who wrote or use ZODB's persistent mechanism, and that we'll
> make decisions will be limiting for other persistent applications.
> (But maybe there aren't any other such applications/users.)

I am following the threads but haven't found time to contribute,
although i really, really want to.  Next week should be much better.  

Actually i have been quite involved with developing a CORBA 
Object transaction service in C++ for the realtime TAO-Object broker. 
One spin-off currently lives at xots.sourceforge.net but personally
i am mainly concentrating on integrating TAO with python first :-)

Interestingly, several people have asked me for transactions *without*
persistence.  They just wanted a lightweight in-memory protocol
for handling atomicity and consistency and didn't give a damn
about durability and transaction monitors.  Overall, i'd like to 
have the basic APIs (as much) orthogonal to each other (as possible).  

Btw, i surely qualify as not knowing ZODB very much :-)

Please have some patience while i am trying to put my thoughts
into order next week. My starting point will probably be 
Phillip's API.

regards,

    holger


From jim@zope.com  Tue Jul 23 15:15:03 2002
From: jim@zope.com (Jim Fulton)
Date: Tue, 23 Jul 2002 10:15:03 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <Sebastien Bigaret's
	message of "15 Jul 2002 15:06:16 +0200">
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com>
	<5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com>
Message-ID: <3D3D64E7.2010508@zope.com>

Phillip J. Eby wrote:
> At 03:47 PM 7/22/02 -0400, Jim Fulton wrote:
> 
>> Phillip J. Eby wrote:
>>
...


>> I think that there should, at least, be a standard cache interface.
>> It should be possible to develop data managers and caches independently.
>> Maybe we could include one or two standard implementations. These could
>> provide useful examples for other implementations and, of course, be
>> useful in themselves.
> 
> 
> Sure.  I personally don't think there's much that you can standardize on 
> in a caching API besides which mapping methods one is required to 
> support, without getting into policy and use cases. 

I expect that you can hide behind the interface.

...

>>> I think maybe I was unclear.  I certainly don't think that the 
>>> interfaces should cease to exist, or that they should not exist as 
>>> documentation.  I'm referring to their inclusion as operating code, 
>>> only.
>>
>>
>> So you don't want them to get imported?
> 
> 
> It's not that I care one way or the other.  Honestly, I'd rather see 
> Interface end up in the standard library too - at least once the 
> metaclass bug is fixed.  :) 

Whqat metaclass bug?

 > But my overriding priority here is a
> standard for Persistence and Transaction bases for eventual inclusion in 
> the standard library.

I'd like to keep the interfaces but make them resilient to the absense
of the interface package. I'll deal with those details.


....

>> True. But maybe someone will think of a way to solve this without proxies
>> or alchemy?
> 
> 
> Unless you're going to fundamentally alter the Python object model, it's 
> not doable.  Python objects by definition get their behavior from their 
> type.  To change the behavior, you must either change the type, the type 
> pointer in the object, or replace the object with another one.

There's been a proposal for adding an observer framework to Python.
A suitable general observer framework just might allow the problem to be
solved.


Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From pje@telecommunity.com  Tue Jul 23 15:29:34 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 10:29:34 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3D3D64E7.2010508@zope.com>
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh>
 <Sebastien Bigaret's message of "15 Jul 2002 15:06:16 +0200">
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <3.0.5.32.20020719115207.0086e100@telecommunity.com>
 <5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com>
 <5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20020723102440.0502cec0@mail.telecommunity.com>

At 10:15 AM 7/23/02 -0400, Jim Fulton wrote:
>Phillip J. Eby wrote:
>>At 03:47 PM 7/22/02 -0400, Jim Fulton wrote:
>>>I think that there should, at least, be a standard cache interface.
>>>It should be possible to develop data managers and caches independently.
>>>Maybe we could include one or two standard implementations. These could
>>>provide useful examples for other implementations and, of course, be
>>>useful in themselves.
>>
>>Sure.  I personally don't think there's much that you can standardize on 
>>in a caching API besides which mapping methods one is required to 
>>support, without getting into policy and use cases.
>
>I expect that you can hide behind the interface.

Huh?


>>It's not that I care one way or the other.  Honestly, I'd rather see 
>>Interface end up in the standard library too - at least once the 
>>metaclass bug is fixed.  :)
>
>Whqat metaclass bug?

You know, the one in Interfaces.Implements, where it doesn't treat 
metaclass instances as classes.  The one you said you were okay with 
fixing, that I provided a patch for, and which Steve Alexander was checking 
to verify that it didn't break anything else in Zope 3...  The one that's 
completely off-topic for this list.  :)



>>>True. But maybe someone will think of a way to solve this without proxies
>>>or alchemy?
>>
>>Unless you're going to fundamentally alter the Python object model, it's 
>>not doable.  Python objects by definition get their behavior from their 
>>type.  To change the behavior, you must either change the type, the type 
>>pointer in the object, or replace the object with another one.
>
>There's been a proposal for adding an observer framework to Python.
>A suitable general observer framework just might allow the problem to be
>solved.

Yes, and any such observer framework is going to have to work via changing 
the type, type pointer, or replacing the object, unless it's going to be by 
altering the fundamental Python object model.  :)



From jim@zope.com  Tue Jul 23 15:32:45 2002
From: jim@zope.com (Jim Fulton)
Date: Tue, 23 Jul 2002 10:32:45 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <20020723072244.83368.qmail@web20704.mail.yahoo.com>
Message-ID: <3D3D690D.8040905@zope.com>

Ilia Iourovitski wrote:
> For RDBMS based storages api should
> provides the following method:

I'll first note that, if these methods are needed
at all, they should be methods on a specific data manager.
They do not affect the transaction or the persistence
frameworks.


> create(object) storage shall populated id from rdbms
> which is usually primary key.

This should not be necessary. One should be able to
design a data manager that detected new objects and
assigned them ids when referencing objects are created.

> delete(object) 

I can imagine a datamanager that lacked garbage collection could
need this.

> load(object type, object id)->object

An object type should be unnecessary. If a data manager
needs to track this sort of information, it should
embed it in the object id.

Note also, that persistence applications load most objects
automatically through object traverssal. It is often
necessary to explicitly load one or more root objects to
provide a starting place for traversl.


> query(string, parameters)->list of objects or smart
> collection
> 
> Those methods can be placed in
> Persistence/IPersistentDataManager.py

No, these methods are specific to particular data manager
APIs, although I can imagine a number of data managers sharing an
API like the one above. Note that IPersistentDataManager.py is
an interface for use by persistent objects. It does not include
all data-manager methods. Similarly,
Transaction.IDataManager.IDataManager is the data-manager API
used by the transaction framework.

Data managers will implement
Persistence.IPersistentDataManager.IPersistentDataManager and
Transaction.IDataManager.IDataManager as well as application APIs
like the one you propose above. Perhaps there should be some
common data-manager application API somewhat like the one you propose
above.

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From jacobs@penguin.theopalgroup.com  Tue Jul 23 15:38:05 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Tue, 23 Jul 2002 10:38:05 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020723102440.0502cec0@mail.telecommunity.com>
Message-ID: <Pine.LNX.4.44.0207231032540.21469-100000@penguin.theopalgroup.com>

On Tue, 23 Jul 2002, Phillip J. Eby wrote:
> >There's been a proposal for adding an observer framework to Python.
> >A suitable general observer framework just might allow the problem to be
> >solved.
> 
> Yes, and any such observer framework is going to have to work via changing 
> the type, type pointer, or replacing the object, unless it's going to be by 
> altering the fundamental Python object model.  :)

I think that this can be done with a light-weight C proxy methods --
hopefully glued in by an intelligent meta-class.  That way, we do not lose
much performance, and don't have to butcher Python's object model too much.

My goal is to migrate towards a persistence system that uses metaclasses and
proxies to do most of the heavy lifting involved in transforming
general user-specified objects into persistent objects.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From jim@zope.com  Tue Jul 23 15:44:34 2002
From: jim@zope.com (Jim Fulton)
Date: Tue, 23 Jul 2002 10:44:34 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <87sn2lvzug.fsf@bidibule.brest.inqual.bzh> <Sebastien Bigaret's
	message of "15 Jul 2002 15:06:16 +0200">
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<3.0.5.32.20020719115207.0086e100@telecommunity.com>
	<5.1.0.14.0.20020722132838.05986020@mail.telecommunity.com>
	<5.1.0.14.0.20020722200752.05208970@mail.telecommunity.com>
	<5.1.0.14.0.20020723102440.0502cec0@mail.telecommunity.com>
Message-ID: <3D3D6BD2.9070807@zope.com>

Phillip J. Eby wrote:
> At 10:15 AM 7/23/02 -0400, Jim Fulton wrote:
> 
>> Phillip J. Eby wrote:
>>
>>> At 03:47 PM 7/22/02 -0400, Jim Fulton wrote:
>>>
>>>> I think that there should, at least, be a standard cache interface.
>>>> It should be possible to develop data managers and caches 
>>>> independently.
>>>> Maybe we could include one or two standard implementations. These could
>>>> provide useful examples for other implementations and, of course, be
>>>> useful in themselves.
>>>
>>>
>>> Sure.  I personally don't think there's much that you can standardize 
>>> on in a caching API besides which mapping methods one is required to 
>>> support, without getting into policy and use cases.
>>
>>
>> I expect that you can hide behind the interface.
> 
> 
> Huh?

Hee hee. Sorry, I expect that you can hide the policies behind the
interface.

...

>>>> True. But maybe someone will think of a way to solve this without 
>>>> proxies
>>>> or alchemy?
>>>
>>>
>>> Unless you're going to fundamentally alter the Python object model, 
>>> it's not doable.  Python objects by definition get their behavior 
>>> from their type.  To change the behavior, you must either change the 
>>> type, the type pointer in the object, or replace the object with 
>>> another one.
>>
>>
>> There's been a proposal for adding an observer framework to Python.
>> A suitable general observer framework just might allow the problem to be
>> solved.
> 
> 
> Yes, and any such observer framework is going to have to work via 
> changing the type, type pointer, or replacing the object, unless it's 
> going to be by altering the fundamental Python object model.  :)

I don't think it would have to much with the type at all. If objects
generate the right events, I think that properly designed observers can
do the necessary work.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From jim@zope.com  Tue Jul 23 15:52:57 2002
From: jim@zope.com (Jim Fulton)
Date: Tue, 23 Jul 2002 10:52:57 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <Pine.LNX.4.44.0207231032540.21469-100000@penguin.theopalgroup.com>
Message-ID: <3D3D6DC9.1060000@zope.com>

Kevin Jacobs wrote:
> On Tue, 23 Jul 2002, Phillip J. Eby wrote:
> 
>>>There's been a proposal for adding an observer framework to Python.
>>>A suitable general observer framework just might allow the problem to be
>>>solved.
>>>
>>Yes, and any such observer framework is going to have to work via changing 
>>the type, type pointer, or replacing the object, unless it's going to be by 
>>altering the fundamental Python object model.  :)
>>
> 
> I think that this can be done with a light-weight C proxy methods --
> hopefully glued in by an intelligent meta-class.  That way, we do not lose
> much performance, and don't have to butcher Python's object model too much.
> 
> My goal is to migrate towards a persistence system that uses metaclasses and
> proxies to do most of the heavy lifting involved in transforming
> general user-specified objects into persistent objects.

Proxies can be a useful tool. We certainly use them a lot, although
I sometimes feel dirty afterwards. ;)  There are a *lot* of gotchas.
I would definately *not* recommend using them for persistence.  I
would find a persistent mix-in to be far less intrusive than proxies.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org



From jacobs@penguin.theopalgroup.com  Tue Jul 23 16:08:36 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Tue, 23 Jul 2002 11:08:36 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3D3D6DC9.1060000@zope.com>
Message-ID: <Pine.LNX.4.44.0207231102560.21768-100000@penguin.theopalgroup.com>

On Tue, 23 Jul 2002, Jim Fulton wrote:
> Proxies can be a useful tool. We certainly use them a lot, although
> I sometimes feel dirty afterwards. ;)  There are a *lot* of gotchas.
> I would definately *not* recommend using them for persistence.  I
> would find a persistent mix-in to be far less intrusive than proxies.

Believe it or not, but we're on the same wavelength:

I'm thinking about proxy-methods a la aspect oriented programming, more than
whole proxy objects.  e.g. cooperative __{g,s}et{attr,item}__ methods that
implement observer semantics and can forward to base-class methods.  Whole
object proxies have the problem that object identity and type information is
obscured in ways that are contrary to standard Python idioms.

-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From pje@telecommunity.com  Tue Jul 23 16:31:50 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 11:31:50 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <Pine.LNX.4.44.0207231102560.21768-100000@penguin.theopalgr
 oup.com>
References: <3D3D6DC9.1060000@zope.com>
Message-ID: <5.1.0.14.0.20020723112620.04fd6b50@mail.telecommunity.com>

At 11:08 AM 7/23/02 -0400, Kevin Jacobs wrote:
>On Tue, 23 Jul 2002, Jim Fulton wrote:
> > Proxies can be a useful tool. We certainly use them a lot, although
> > I sometimes feel dirty afterwards. ;)  There are a *lot* of gotchas.
> > I would definately *not* recommend using them for persistence.  I
> > would find a persistent mix-in to be far less intrusive than proxies.
>
>Believe it or not, but we're on the same wavelength:
>
>I'm thinking about proxy-methods a la aspect oriented programming, more than
>whole proxy objects.  e.g. cooperative __{g,s}et{attr,item}__ methods that
>implement observer semantics and can forward to base-class methods.  Whole
>object proxies have the problem that object identity and type information is
>obscured in ways that are contrary to standard Python idioms.

So, you're saying you want to alter the types, then?  The interesting part 
of that is how to alter them in such a way that your observing code doesn't 
get re-entered when you're modifying both subclasses and base classes of 
the objects.  You'd need some kind of thread-specific collaboration stack, 
I think.



From jacobs@penguin.theopalgroup.com  Tue Jul 23 16:37:27 2002
From: jacobs@penguin.theopalgroup.com (Kevin Jacobs)
Date: Tue, 23 Jul 2002 11:37:27 -0400 (EDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020723112620.04fd6b50@mail.telecommunity.com>
Message-ID: <Pine.LNX.4.44.0207231133530.22464-100000@penguin.theopalgroup.com>

On Tue, 23 Jul 2002, Phillip J. Eby wrote:
> At 11:08 AM 7/23/02 -0400, Kevin Jacobs wrote:
> >On Tue, 23 Jul 2002, Jim Fulton wrote:
> > > Proxies can be a useful tool. We certainly use them a lot, although
> > > I sometimes feel dirty afterwards. ;)  There are a *lot* of gotchas.
> > > I would definately *not* recommend using them for persistence.  I
> > > would find a persistent mix-in to be far less intrusive than proxies.
> >
> >Believe it or not, but we're on the same wavelength:
> >
> >I'm thinking about proxy-methods a la aspect oriented programming, more than
> >whole proxy objects.  e.g. cooperative __{g,s}et{attr,item}__ methods that
> >implement observer semantics and can forward to base-class methods.  Whole
> >object proxies have the problem that object identity and type information is
> >obscured in ways that are contrary to standard Python idioms.
> 
> So, you're saying you want to alter the types, then?  The interesting part 
> of that is how to alter them in such a way that your observing code doesn't 
> get re-entered when you're modifying both subclasses and base classes of 
> the objects.  You'd need some kind of thread-specific collaboration stack, 
> I think.

I suppose, though saying 'alter the types' implies slightly different things
to me.  I don't see great difficulty in isolating subclass and superclass
modifications, although performance is clearly an important issue.  As for
the thread-specific business, you've totally lost me.  Can you provide a
use-case so that I can better understand where you are coming from?

Thanks,
-Kevin

--
Kevin Jacobs
The OPAL Group - Enterprise Systems Architect
Voice: (216) 986-0710 x 19         E-mail: jacobs@theopalgroup.com
Fax:   (216) 986-0714              WWW:    http://www.theopalgroup.com



From pje@telecommunity.com  Tue Jul 23 17:00:11 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 12:00:11 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <Pine.LNX.4.44.0207231133530.22464-100000@penguin.theopalgr
 oup.com>
References: <5.1.0.14.0.20020723112620.04fd6b50@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20020723114104.0541e4f0@mail.telecommunity.com>

At 11:37 AM 7/23/02 -0400, Kevin Jacobs wrote:
>On Tue, 23 Jul 2002, Phillip J. Eby wrote:
>
> > So, you're saying you want to alter the types, then?  The interesting part
> > of that is how to alter them in such a way that your observing code 
> doesn't
> > get re-entered when you're modifying both subclasses and base classes of
> > the objects.  You'd need some kind of thread-specific collaboration stack,
> > I think.
>
>I suppose, though saying 'alter the types' implies slightly different things

I mean, if you're proxying methods, presumably you're doing so by altering 
the methods provided by the type, unless you mean to change the type's type 
so that the methods are altered on the fly.  Either way, a change to the 
type instance.  :)


>to me.  I don't see great difficulty in isolating subclass and superclass
>modifications, although performance is clearly an important issue.  As for
>the thread-specific business, you've totally lost me.  Can you provide a
>use-case so that I can better understand where you are coming from?

Consider a co-operative method that performs a super() call.  If one 
surrounds both the super and subclass with observer calls, they will take 
place more than once.  Perhaps that's what you mean by performance; I 
suppose if you are strictly observing things, it may not be a big deal to 
have the methods called more than once.

My comment about thread-specificness was about a way to ensure that the 
wrapper method only gets called once.  It's not relevant if you don't plan 
to ensure that wrappers on co-operative methods are called only once.

I should note, however, that there is one possibly rather important use 
case for not calling a wrapper more than once: object changes.  Let's say 
that class B is a subclass of class A.  B had an invariant that attribute 
"q" is always 3 times attribute "r", and has a setR() method that sets 
"r".  It uses a super() call to class A to do the actual setting of R, and 
then sets the "q" attribute.  Now, if there is a post-return observer 
associated with the setR() method in both A and B, it will be called at a 
point where it will announce a state that is valid for objects of type A, 
but violates an invariant for the specific instance being 
announced.  (Also, even if you didn't care about publishing an invalid 
state, it should be noted that use cases like Tim Peters' example of a 
distributed cache would really multiply the performance issue, especially 
if you're talking about a deep hierarchy of super() calls.)

Anyway, if we're strictly talking about observers, the simplest way to 
address this might be to carry a per-instance "nesting count" that you 
increment on entry to every proxy and decrement on exit.   When the count 
reaches zero on exit, fire any pending observation events.  Downside to 
this approach: if multiple threads enter overlapping calls on the object, a 
sort of "livelock" can occur where the object never issues any events.  To 
address that, you would have to have at least per-thread counters for each 
instance, which adds some more performance overhead for access.  This is 
where my comment about thread-specific collaboration stacks came from.



From iiourov@yahoo.com  Tue Jul 23 18:08:36 2002
From: iiourov@yahoo.com (Ilia Iourovitski)
Date: Tue, 23 Jul 2002 10:08:36 -0700 (PDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <3D3D690D.8040905@zope.com>
Message-ID: <20020723170836.41755.qmail@web20702.mail.yahoo.com>

--- Jim Fulton <jim@zope.com> wrote:
> Ilia Iourovitski wrote:
> > For RDBMS based storages api should
> > provides the following method:
> 
> I'll first note that, if these methods are needed
> at all, they should be methods on a specific data
> manager.
> They do not affect the transaction or the
> persistence
> frameworks.
> 
> 
> > create(object) storage shall populated id from
> rdbms
> > which is usually primary key.
> 
> This should not be necessary. One should be able to
> design a data manager that detected new objects and
> assigned them ids when referencing objects are
> created.

Typical storage (rdbms, odbms, xml like xindicea)
do not provide root object. So after transaction
started
object must be loaded from storage or created.

> 
> > delete(object) 
> 
> I can imagine a datamanager that lacked garbage
> collection could
> need this.
>
in case of rdbms there are objects which are not
referenced. 
 
> > load(object type, object id)->object
> 
> An object type should be unnecessary. If a data
> manager
> needs to track this sort of information, it should
> embed it in the object id.

In rdbms case id usually integer. adding the whole
package/class name can be expensive.
> 
> Note also, that persistence applications load most
> objects
> automatically through object traverssal. It is often
> necessary to explicitly load one or more root
> objects to
> provide a starting place for traversl.
> 
> 
> > query(string, parameters)->list of objects or
> smart
> > collection
> > 
> > Those methods can be placed in
> > Persistence/IPersistentDataManager.py
> 
> No, these methods are specific to particular data
> manager
> APIs, although I can imagine a number of data
> managers sharing an
> API like the one above. Note that
> IPersistentDataManager.py is
> an interface for use by persistent objects. It does
> not include
> all data-manager methods. Similarly,
> Transaction.IDataManager.IDataManager is the
> data-manager API
> used by the transaction framework.
> 
And most storages like rdbms, ldap, xml has it.

> Data managers will implement
>
Persistence.IPersistentDataManager.IPersistentDataManager
> and
> Transaction.IDataManager.IDataManager as well as
> application APIs
> like the one you propose above. Perhaps there should
> be some
> common data-manager application API somewhat like
> the one you propose
> above.
> 
> Jim
> 
> 
> -- 
> Jim Fulton           mailto:jim@zope.com      
> Python Powered!
> CTO                  (888) 344-4332           
> http://www.python.org
> Zope Corporation     http://www.zope.com      
> http://www.zope.org
> 


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com


From pje@telecommunity.com  Tue Jul 23 19:05:35 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 14:05:35 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <20020723170836.41755.qmail@web20702.mail.yahoo.com>
References: <3D3D690D.8040905@zope.com>
Message-ID: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com>

At 10:08 AM 7/23/02 -0700, Ilia Iourovitski wrote:
>
> > > load(object type, object id)->object
> >
> > An object type should be unnecessary. If a data
> > manager
> > needs to track this sort of information, it should
> > embed it in the object id.
>
>In rdbms case id usually integer. adding the whole
>package/class name can be expensive.

This is easily addressed by using separate data managers for each table or 
other base class type.  No need to carry the type in the object ID.


> > > query(string, parameters)->list of objects or
> > smart
> > > collection
> > >
> > > Those methods can be placed in
> > > Persistence/IPersistentDataManager.py
> >
> > No, these methods are specific to particular data
> > manager
> > APIs, although I can imagine a number of data
> > managers sharing an
> > API like the one above. Note that
> > IPersistentDataManager.py is
> > an interface for use by persistent objects. It does
> > not include
> > all data-manager methods. Similarly,
> > Transaction.IDataManager.IDataManager is the
> > data-manager API
> > used by the transaction framework.
> >
>And most storages like rdbms, ldap, xml has it.

The most straightforward way to handle queries is by creating query data 
managers, which take OIDs that represent the parameters of the query.

Note, by the way, that IPersistentDataManager is an interface exposed to 
persistent objects by their data manager.  It is *not* the interface a data 
manager exposes to application code, which can and should be quite different.


> > Data managers will implement
> >
>Persistence.IPersistentDataManager.IPersistentDataManager
> > and
> > Transaction.IDataManager.IDataManager as well as
> > application APIs
> > like the one you propose above. Perhaps there should
> > be some
> > common data-manager application API somewhat like
> > the one you propose
> > above.

I agree with Jim that none of this stuff is needed in the interface that a 
data manager exposes to persistent objects.  This stuff would be in a data 
manager's application-level interface, and I don't see any need for 
standardization there; that's an area for value-add by competing 
persistence mechanisms and data manager implementations.  Any 
standardization of them now would be counter-productive, I think.


From iiourov@yahoo.com  Tue Jul 23 19:35:48 2002
From: iiourov@yahoo.com (Ilia Iourovitski)
Date: Tue, 23 Jul 2002 11:35:48 -0700 (PDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com>
Message-ID: <20020723183548.42816.qmail@web20705.mail.yahoo.com>


--- "Phillip J. Eby" <pje@telecommunity.com> wrote:
> At 10:08 AM 7/23/02 -0700, Ilia Iourovitski wrote:
> >
> > > > load(object type, object id)->object
> > >
> > > An object type should be unnecessary. If a data
> > > manager
> > > needs to track this sort of information, it
> should
> > > embed it in the object id.
> >
> >In rdbms case id usually integer. adding the whole
> >package/class name can be expensive.
> 
> This is easily addressed by using separate data
> managers for each table or 
> other base class type.  No need to carry the type in
> the object ID.
> 
You mean one data manager per table. Too much.
> 
> > > > query(string, parameters)->list of objects or
> > > smart
> > > > collection
> > > >
> > > > Those methods can be placed in
> > > > Persistence/IPersistentDataManager.py
> > >
> > > No, these methods are specific to particular
> data
> > > manager
> > > APIs, although I can imagine a number of data
> > > managers sharing an
> > > API like the one above. Note that
> > > IPersistentDataManager.py is
> > > an interface for use by persistent objects. It
> does
> > > not include
> > > all data-manager methods. Similarly,
> > > Transaction.IDataManager.IDataManager is the
> > > data-manager API
> > > used by the transaction framework.
> > >
> >And most storages like rdbms, ldap, xml has it.
> 
> The most straightforward way to handle queries is by
> creating query data 
> managers, which take OIDs that represent the
> parameters of the query.
> 
Most of the time people retrive object by attributes.
not by OID.

> Note, by the way, that IPersistentDataManager is an
> interface exposed to 
> persistent objects by their data manager.  It is
> *not* the interface a data 
> manager exposes to application code, which can and
> should be quite different.
> 
> 
> > > Data managers will implement
> > >
>
>Persistence.IPersistentDataManager.IPersistentDataManager
> > > and
> > > Transaction.IDataManager.IDataManager as well as
> > > application APIs
> > > like the one you propose above. Perhaps there
> should
> > > be some
> > > common data-manager application API somewhat
> like
> > > the one you propose
> > > above.
> 
> I agree with Jim that none of this stuff is needed
> in the interface that a 
> data manager exposes to persistent objects.  This
> stuff would be in a data 
> manager's application-level interface, and I don't
> see any need for 
> standardization there; that's an area for value-add
> by competing 
> persistence mechanisms and data manager
> implementations.  Any 
> standardization of them now would be
> counter-productive, I think.
> 
It's already exist. Look at the JDO. In java land OR
Mappers popular because instead of learning every time
different api/query language you can use odmg/oql
against rdmbs, odbms, ldap, xml, you name it.
It is major selling point for "generic persistance"
toolkit.


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

From pje@telecommunity.com  Tue Jul 23 20:19:32 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 15:19:32 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <20020723183548.42816.qmail@web20705.mail.yahoo.com>
References: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20020723150312.04f70390@mail.telecommunity.com>

At 11:35 AM 7/23/02 -0700, Ilia Iourovitski wrote:

>--- "Phillip J. Eby" <pje@telecommunity.com> wrote:
> > At 10:08 AM 7/23/02 -0700, Ilia Iourovitski wrote:
> > >
> > > > > load(object type, object id)->object
> > > >
> > > > An object type should be unnecessary. If a data
> > > > manager
> > > > needs to track this sort of information, it
> > should
> > > > embed it in the object id.
> > >
> > >In rdbms case id usually integer. adding the whole
> > >package/class name can be expensive.
> >
> > This is easily addressed by using separate data
> > managers for each table or
> > other base class type.  No need to carry the type in
> > the object ID.
> >
>You mean one data manager per table. Too much.

Why?  I could simply do something like this:

class MyDBManager:

     def __init__(self, sqlconn):
         self.table1 = TableDBManager("table1", sqlconn, ...)
         self.table2 = TableDBManager("table2", sqlconn, ...)
         ...

myDB = MyDBManager(someSQLconnection)

And then refer to myDB.table1['someKey'] to load an object.  This doesn't 
seem like "too much", especially since you could generate the individual 
DM's based on metadata.

At any rate, this is pretty much the approach I intend to use myself, 
except that using PEAK eliminates the need for the __init__ method.



> > > > > query(string, parameters)->list of objects or
> > > > smart
> > > > > collection
> > > > >
> > > > > Those methods can be placed in
> > > > > Persistence/IPersistentDataManager.py
> > > >
> > > > No, these methods are specific to particular
> > data
> > > > manager
> > > > APIs, although I can imagine a number of data
> > > > managers sharing an
> > > > API like the one above. Note that
> > > > IPersistentDataManager.py is
> > > > an interface for use by persistent objects. It
> > does
> > > > not include
> > > > all data-manager methods. Similarly,
> > > > Transaction.IDataManager.IDataManager is the
> > > > data-manager API
> > > > used by the transaction framework.
> > > >
> > >And most storages like rdbms, ldap, xml has it.
> >
> > The most straightforward way to handle queries is by
> > creating query data
> > managers, which take OIDs that represent the
> > parameters of the query.
> >
>Most of the time people retrive object by attributes.
>not by OID.

Right.  So define a query manager that takes the attributes as fields in an 
OID, and returns a persistent object that represents a sequence of 
records.  e.g.

for object in someQueryMgr[ ('param1value','param2value') ]:
     ...

All you need is a separate query manager for each (parameterized) query 
your app needs -- and again, there's nothting stopping you from generating 
your own via metadata or even from OQL if that's your heart's desire.


> > I agree with Jim that none of this stuff is needed
> > in the interface that a
> > data manager exposes to persistent objects.  This
> > stuff would be in a data
> > manager's application-level interface, and I don't
> > see any need for
> > standardization there; that's an area for value-add
> > by competing
> > persistence mechanisms and data manager
> > implementations.  Any
> > standardization of them now would be
> > counter-productive, I think.
> >
>It's already exist. Look at the JDO. In java land OR
>Mappers popular because instead of learning every time
>different api/query language you can use odmg/oql
>against rdmbs, odbms, ldap, xml, you name it.
>It is major selling point for "generic persistance"
>toolkit.

I'd have to disagree with you there.  There is very little commonality 
between Java data mappers; many offer some sort of OQL dialect, but they 
vary in so many other aspects of their implementations and usage that 
calling them standardized would be a joke.

Please note that specific persistence mechanisms and query languages -- 
especially any kind of "generic persistence toolkit" -- are completely out 
of scope for this SIG's goals.  We want to standardize the *basis* for you 
to create your *own* persistence mechanisms, query languages, and so 
on.  The SIG will not be creating any code that actually talks to any kind 
of database, nor supplies any kind of data management API.  To the best of 
my understanding, the SIG's charter is focused on these interfaces:

* The interface which objects to be persisted must supply to their data manager

* The interface which data managers must supply to their persistent objects

* The interface which transaction participants must supply to a transaction

* The interface which transaction objects supply to their participants

* The interface which transaction objects supply to an application

The items you are talking about are not a part of any of these interfaces.


From iiourov@yahoo.com  Tue Jul 23 21:16:32 2002
From: iiourov@yahoo.com (Ilia Iourovitski)
Date: Tue, 23 Jul 2002 13:16:32 -0700 (PDT)
Subject: [Persistence-sig] "Straw Baby" Persistence API
In-Reply-To: <5.1.0.14.0.20020723150312.04f70390@mail.telecommunity.com>
Message-ID: <20020723201632.89700.qmail@web20709.mail.yahoo.com>


--- "Phillip J. Eby" <pje@telecommunity.com> wrote:
> At 11:35 AM 7/23/02 -0700, Ilia Iourovitski wrote:
> 
> >--- "Phillip J. Eby" <pje@telecommunity.com> wrote:
> > > At 10:08 AM 7/23/02 -0700, Ilia Iourovitski
> wrote:
> > > >
> > > > > > load(object type, object id)->object
> > > > >
> > > > > An object type should be unnecessary. If a
> data
> > > > > manager
> > > > > needs to track this sort of information, it
> > > should
> > > > > embed it in the object id.
> > > >
> > > >In rdbms case id usually integer. adding the
> whole
> > > >package/class name can be expensive.
> > >
> > > This is easily addressed by using separate data
> > > managers for each table or
> > > other base class type.  No need to carry the
> type in
> > > the object ID.
> > >
> >You mean one data manager per table. Too much.
> 
> Why?  I could simply do something like this:
> 
> class MyDBManager:
> 
>      def __init__(self, sqlconn):
>          self.table1 = TableDBManager("table1",
> sqlconn, ...)
>          self.table2 = TableDBManager("table2",
> sqlconn, ...)
>          ...
> 
> myDB = MyDBManager(someSQLconnection)
> 
> And then refer to myDB.table1['someKey'] to load an
> object.  This doesn't 
> seem like "too much", especially since you could
> generate the individual 
> DM's based on metadata.
> 
> At any rate, this is pretty much the approach I
> intend to use myself, 
> except that using PEAK eliminates the need for the
> __init__ method.
> 
> 
> 
> > > > > > query(string, parameters)->list of objects
> or
> > > > > smart
> > > > > > collection
> > > > > >
> > > > > > Those methods can be placed in
> > > > > > Persistence/IPersistentDataManager.py
> > > > >
> > > > > No, these methods are specific to particular
> > > data
> > > > > manager
> > > > > APIs, although I can imagine a number of
> data
> > > > > managers sharing an
> > > > > API like the one above. Note that
> > > > > IPersistentDataManager.py is
> > > > > an interface for use by persistent objects.
> It
> > > does
> > > > > not include
> > > > > all data-manager methods. Similarly,
> > > > > Transaction.IDataManager.IDataManager is the
> > > > > data-manager API
> > > > > used by the transaction framework.
> > > > >
> > > >And most storages like rdbms, ldap, xml has it.
> > >
> > > The most straightforward way to handle queries
> is by
> > > creating query data
> > > managers, which take OIDs that represent the
> > > parameters of the query.
> > >
> >Most of the time people retrive object by
> attributes.
> >not by OID.
> 
> Right.  So define a query manager that takes the
> attributes as fields in an 
> OID, and returns a persistent object that represents
> a sequence of 
> records.  e.g.
> 
> for object in someQueryMgr[
> ('param1value','param2value') ]:
>      ...
> 
> All you need is a separate query manager for each
> (parameterized) query 
> your app needs -- and again, there's nothting
> stopping you from generating 
> your own via metadata or even from OQL if that's
> your heart's desire.
> 
> 
> > > I agree with Jim that none of this stuff is
> needed
> > > in the interface that a
> > > data manager exposes to persistent objects. 
> This
> > > stuff would be in a data
> > > manager's application-level interface, and I
> don't
> > > see any need for
> > > standardization there; that's an area for
> value-add
> > > by competing
> > > persistence mechanisms and data manager
> > > implementations.  Any
> > > standardization of them now would be
> > > counter-productive, I think.
> > >
> >It's already exist. Look at the JDO. In java land
> OR
> >Mappers popular because instead of learning every
> time
> >different api/query language you can use odmg/oql
> >against rdmbs, odbms, ldap, xml, you name it.
> >It is major selling point for "generic persistance"
> >toolkit.
> 
> I'd have to disagree with you there.  There is very
> little commonality 
> between Java data mappers; many offer some sort of
> OQL dialect, but they 
> vary in so many other aspects of their
> implementations and usage that 
> calling them standardized would be a joke.
> 
Both Castor and Object Bridge support odmg and oql.
Object Bridge is going to be "db.apache.org standard".
JDO specification is close to odmg.

> Please note that specific persistence mechanisms and
> query languages -- 
> especially any kind of "generic persistence toolkit"
> -- are completely out 
> of scope for this SIG's goals.  We want to
> standardize the *basis* for you 
> to create your *own* persistence mechanisms, query
> languages, and so 
> on.  The SIG will not be creating any code that
> actually talks to any kind 
> of database, nor supplies any kind of data
> management API.  To the best of 
> my understanding, the SIG's charter is focused on
> these interfaces:
> 
> * The interface which objects to be persisted must
> supply to their data manager
> 
> * The interface which data managers must supply to
> their persistent objects
> 
> * The interface which transaction participants must
> supply to a transaction
> 
> * The interface which transaction objects supply to
> their participants
> 
> * The interface which transaction objects supply to
> an application
> 
> The items you are talking about are not a part of
> any of these interfaces.
> 





__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

From pje@telecommunity.com  Wed Jul 24 00:33:46 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 19:33:46 -0400
Subject: [Persistence-sig] Is threaded access to persistent objects in scope?
Message-ID: <5.1.0.14.0.20020723192604.050ad680@mail.telecommunity.com>

Does anybody have any use cases for multi-thread access to the same 
persistent object?

ZODB explicitly denies such thread-safety, making each thread responsible 
for maintaining a separate object cache, or otherwise synchronizing access, 
and thus avoiding locking issues and all the associated complexity.

I don't have any need to change this, personally; I'm happy staying as far 
away from threading issues as possible.  But does anybody have any 
*concrete* use cases where threaded access to the *same* object is a 
necessity?  By same, I mean the identical object pointer, rather than a 
copy of the object loaded specifically for that thread?  I haven't managed 
to come up with any use cases that wouldn't be better handled using message 
or event queues, or something like the Linda "tuplespace".

By the way, when I say "concrete", I mean that saying "oh, that sounds 
terrible for performance and language X doesn't do it that way" is not a 
"concrete" use case.  :)

Thanks!


From pje@telecommunity.com  Wed Jul 24 02:01:24 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 21:01:24 -0400
Subject: [Persistence-sig] A simple Observation API
Message-ID: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>

I've taken the time this evening to draft a simple Observation API, and an 
implementation of it.  It's not well-documented, but the API should be 
fairly clear from the example below.  Comments and questions encouraged.

Note that this draft doesn't deal with any threading issues whatsoever.  It 
also doesn't try to address the possibility that an observer might throw an 
exception when it's given a notification during a 'finally' clause that 
closes a beginWrite/endWrite pair.  If anybody has suggestions for how to 
handle these situations, please let me know.

By the way, my informal tests show that subclassing Observable makes an 
object's attribute read access approximately 11 times slower than normal, 
even if no actual observation is taking place (i.e., an _o_readHook is not 
set).  I have not yet done a timing comparison for write operations and 
method calls, but I expect the slowdown to be as bad, or worse.  Rewriting 
Observation.py in C, using structure slots for many of the attributes, 
would probably eliminate most of these slowdowns, at least for unobserved 
instances.  Of course, any operations actually performed by a change 
observer or read hook, would add their own overhead, in addition to the raw 
observation overhead.

This is a fairly "transparent" API, although it still requires the user to 
subclass a specific base, and declare which mutable attributes are touched 
by what methods.  But it is less invasive, in that observation-specific 
code does not need to be incorporated into the methods themselves.

One possible enhancement to this framework: use separate observer lists for 
the beforeChange() and afterChange() events, and make them simple callables 
instead of objects with obvservation methods.  While this would require an 
additional attribute, it would simplify the process of creating dynamic 
activation methods, and reduce calls in situations where only one event 
needed to be captured.  This could be useful for setting up observation on 
a mutable attribute so as to "wire" it to trigger change events on the 
object(s) that contained it.

Anyway, here's the demo, followed by the module itself.


#### Demo of observation API ####

from Observation import Observable, WritingMethod

class aSubject(Observable):

     def __init__(self):
         self.spam = []

     # __init__ touches spam, but shouldn't notify anyone about it
     __init__ = WritingMethod(__init__, ignore=['spam'])


     def addSpam(self,spam):
         self.spam.append(spam)

     # addSpam touches spam, even though it doesn't set the attribute
     addSpam = WritingMethod(addSpam, attrs=['spam'])


     def setFoo(self, foo):
         self.foo = foo
         self.bar = 3*foo

     # setFoo modifies multiple attributes, and should send at most
     # one notice of modification, upon exiting.
     setFoo = WritingMethod(setFoo)


class anObserver(object):

     def beforeChange(self, ob):
         print ob,"is about to change"

     def afterChange(self, ob, attrs):
         print ob,"changed",attrs

     def getAttr(self, ob, attr):
         print "reading",attr,"of",ob
         return object.__getattribute__(ob,attr)


subj = aSubject()
obs  = anObserver()

subj._o_changeObservers = (obs,)
subj._o_readHook = obs.getAttr

subj.setFoo(9)
print subj.bar

subj.addSpam('1 can')

#####  End sample code #####



#### Observation.py ####

__all__ = ['Observable', 'WritingMethod', 'getAttr', 'setAttr', 'delAttr']

getAttr = object.__getattribute__
setAttr = object.__setattr__
delAttr = object.__delattr__

class Observable(object):

     """Object that can send read/write notifications"""

     _o_readHook = staticmethod(getAttr)
     _o_nestCount = 0
     _o_changedAttrs = ()
     _o_observers = ()

     def _o_beginWrite(self):

         """Start a (possibly nested) write operation"""

         ct = self._o_nestCount
         self._o_nestCount = ct + 1

         if ct:
             return

         for ob in self._o_changeObservers:
             ob.beforeChange(self)


     def _o_endWrite(self):

         """Finish a (possibly nested) write operation"""

         ct = self._o_nestCount = self._o_nestCount - 1

         if ct:
             return

         ca = self._o_changedAttrs

         if ca:
             del self._o_changedAttrs
             for ob in self._o_changeObservers:
                 ob.afterChange(self,ca)


     def __getattribute__(self,attr):

         """Return an attribute of the object, using a read hook if 
available"""

         if attr.startswith('_o_') or attr=='__dict__':
             return getAttr(self,attr)

         return getAttr(self,'_o_readHook')(self, attr)


     def __setattr__(self,attr,val):

         if attr.startswith('_o_') or attr=='__dict__':
             setAttr(self,attr,val)

         else:
             self._o_beginWrite()

             try:
                 ca = self._o_changedAttrs

                 if attr not in ca:
                     self._o_changedAttrs = ca + (attr,)

                 setAttr(self,attr,val)

             finally:
                 self._o_endWrite()


     def __delattr__(self,attr):

         if attr.startswith('_o_') or attr=='__dict__':
             delAttr(self,attr)

         else:
             self._o_beginWrite()

             try:
                 ca = self._o_changedAttrs
                 if attr not in ca:
                     self._o_changedAttrs = ca + (attr,)

                 delAttr(self,attr)
             finally:
                 self._o_endWrite()


from new import instancemethod

class WritingMethod(object):

     """Wrap this around a function to handle write observation 
automagically"""

     def __init__(self, func, attrs=(), ignore=()):
         self.func = func
         self.attrs = tuple(attrs)
         self.ignore = tuple(ignore)

     def __get__(self, ob, typ=None):

         if typ is None:
            typ = type(ob)

         return instancemethod(self, ob, typ)

     def __call__(self, inst, *args, **kwargs):

         attrs, remove = self.attrs, self.ignore

         inst._o_beginWrite()

         try:
             if attrs or remove:
                 ca = inst._o_changedAttrs
                 remove = [(r,1) for r in remove if r not in ca]
                 inst._o_changedAttrs = ca + attrs

             return self.func(inst, *args, **kwargs)

         finally:
             if remove:
                 inst._o_changedAttrs = tuple(
                     [a for a in inst._o_changedAttrs if a not in remove]
                 )
             inst._o_endWrite()


From pje@telecommunity.com  Wed Jul 24 02:09:35 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 23 Jul 2002 21:09:35 -0400
Subject: [Persistence-sig] Clarification: A simple Observation API
In-Reply-To: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
Message-ID: <5.1.0.14.0.20020723210437.061d4030@mail.telecommunity.com>

At 09:01 PM 7/23/02 -0400, Phillip J. Eby wrote:

>class aSubject(Observable):
>
>     ....
>
>     def setFoo(self, foo):
>         self.foo = foo
>         self.bar = 3*foo
>
>     # setFoo modifies multiple attributes, and should send at most
>     # one notice of modification, upon exiting.
>     setFoo = WritingMethod(setFoo)


Just a quick clarification on the demo code...  if a method only sets one 
attribute, or if it sets multiple attributes, but you don't care about 
consolidating the change events, it's not necessary to declare the method a 
WritingMethod.  In that case, the __setattr__ hook will issue change events 
for each attribute set, unless the method is being called by a 
WritingMethod, either directly or indirectly.  Use of a WritingMethod 
wrapper is only required for methods that set attributes and need the 
changes to be ignored, or which manipulate mutable attributes without 
actually setting attributes on the instance.  Any other use is optional at 
the implementor's discretion.


From donnalcwalter@yahoo.com  Wed Jul 24 09:12:35 2002
From: donnalcwalter@yahoo.com (Donnal Walter)
Date: Wed, 24 Jul 2002 01:12:35 -0700 (PDT)
Subject: [Persistence-sig] Naive questions about getting and setting
In-Reply-To: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
Message-ID: <20020724081235.31632.qmail@web13906.mail.yahoo.com>

1. Would naive (and rather application specific) questions such as
these be better posed to comp.lang.python? If so, I would happily
comply in the future.

2. In regard to the persistence API and especially in regard to
observation, would someone please point out the pitfalls of using
so-called "setter" and "getter" methods in attribute classes
themselves, as opposed to __setattribute__ and __getattribute__
methods in the container classes?

I am in no way proposing this as a general solution, but if in a
given situation one wanted to set up a scheme similar to that coded
below, what would be the major liabilities? Is it simply a matter
of lack of transparency? Or is there also a serious problem with
decreased efficiency?

=======
class Cell(object):
    def __init__(self, *args):
        """Arguments, if any, must be references to other cells."""
        self.__value = None         # the scalar value of the Cell
        self.__observers = []       # list of observers 

        if len(args) > 0:           # if this Cell is dependent
            self.ref = args           # save the list of references
            for i in self.ref:        # for every external ref
                i.AddObserver(self)     # register as an observer
            self.Update()             # set initial from refs
        else:                       # if this Cell is independent
            self.Reset(content)       # simply reset its value

    def AddObserver(self, observer):
        if observer not in self.__observers:
            self.__observers.append(observer)

    def _setValue(self, value):
        if value != self.__value:
            self.__value = value
            for o in self.__observers:
                o.Update()

    def _getValue(self):
        return self.__value

    def Set(self, input):
        try:
            # make sure input can be converted
            self._setValue(self.Encode(input))
        except ValueError:
            # if incompatible input value, reinitialize
            self.Reset()
        
    def Get(self):
        return self.Decode(self._getValue())

    def Encode(value):
        """ override to change Cell type"""
        return value

    def Decode(value):
        """ override to change Cell type"""
        return value
    
    def Update(self):
        """ Override in observer Cells.
        (Observers have access to the self.ref list.)
        """
        pass

    def Reset(self):
        """ may be overridden to change default value"""
        self._setValue('')
=======


=====
Donnal Walter
Arkansas Children's Hospital

__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

From Sebastien.Bigaret@inqual.com  Fri Jul 26 14:54:11 2002
From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret)
Date: 26 Jul 2002 15:54:11 +0200
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: "Phillip J. Eby"'s message of "Tue, 23 Jul 2002 21:01:24 -0400"
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
Message-ID: <873cu6a930.fsf@bidibule.brest.inqual.bzh>


  Now that the code has shown up, I have some comments :) --just kidding, I
was just to busy to read the list since the beginning of the week, now I read
the whole stuuf ; I must admit I did not understand in details every single
arguments and points you discussed about, so I might have questions for some
stuff that were answered but that I didn't understand.

Here are some notes & questions I had while reading:

About caching and caching policies: 

  Phillip did talk about 'transactional caching' and I'm not sure what it
  really is, however, there is some needs to have 'application-wide' caching
  mechanism to avoid unnecessary round-trips to the DB. Of course, this should
  not defeat the 'smallest-possible-memory-footprint-requirement' pointed out
  in the sig charter ; but if an object has already been fetched somewhere
  (and is still active in an other thread, or the cache/snapshots would have
  been deleted), then it is usually unnecessary to re-fetch the object, simply
  use the cached snapshot instead. But this sounds to me a bit off-topic for
  this list.


+1 on defining a state model for persistent objects ; however I'm a little
fuzzy about the difference between 'unsaved' and 'changed'. To my
understanding 'unsaved' is for new objects, while 'changed' is for existing
(previously made persistent objects, is this right?

About RDBMS: I'm ok with what has been said ; I agree that most of the work
has to be done at DM-level ; observability, as shown in the demo. code, seems
sufficient for most purposes. The only thing I cannot see how it can be done
is:

Ilia> create(object) storage shall populated id from rdbms
Ilia> which is usually primary key.

Jim> This should not be necessary. One should be able to
Jim> design a data manager that detected new objects and
Jim> assigned them ids when referencing objects are created.

Can you elaborate on that?


More on the Observation API:
> Note that this draft doesn't deal with any threading issues whatsoever.  

You asked earlier "does anybody have any *concrete* use cases where threaded
access to the *same* object is a necessity?". The only use-case I can think
about is when you have a pool of objects shared by all threads (e.g. to avoid
unnecessary round-trips to a DB for accessing mostly-read-only objects), where
it is possible that other objects, loaded/copied specifically for each thread,
can have references (relationships) to shared objects. I'm not saying,
however, that threading issues **should** be addressed because of this, i have
the feeling that, if you want such a feature, you can afford the extra effort
to make these shared objects thread-safe (e.g. reentrant locks are ok as long
as you access the objects' attributes using getters/setters and not directly).


> This is a fairly "transparent" API, although it still requires the user to 
> subclass a specific base, and declare which mutable attributes are touched 
> by what methods.  But it is less invasive, in that observation-specific 
> code does not need to be incorporated into the methods themselves.

It looks pretty to my eyes, indeed.

Last question to make sure I did not miss an important point: after having
read all your messages and after I had a look on the Persistence package in
Zope3, this is how I understand the ``unghostification'' of an object: it
holds a flag saying whether it is a ghost, and has a special attribute,
_p_datamanager.  The Persistent object has a _p_activate() method which in
turn calls the setstate() method an the IPersistentDataManager.setstate() ;
this is triggered automatically. Is that it?

  Can someone be more explicit about when this is triggered? I tried to look
at the C code but I'm not familiar at all with C code for python and couldnt
get a clear answer.


-- Sebastien.


From guido@python.org  Mon Jul 29 22:06:43 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 29 Jul 2002 17:06:43 -0400
Subject: [Persistence-sig] Naive questions about getting and setting
In-Reply-To: Your message of "Wed, 24 Jul 2002 01:12:35 PDT."
             <20020724081235.31632.qmail@web13906.mail.yahoo.com> 
References: <20020724081235.31632.qmail@web13906.mail.yahoo.com> 
Message-ID: <200207292106.g6TL6ij06410@pcp02138704pcs.reston01.va.comcast.net>

> 1. Would naive (and rather application specific) questions such as
> these be better posed to comp.lang.python? If so, I would happily
> comply in the future.

I don't know, but given the resounding silence in response to your
email you may have drawn this conclusion yourself... :-)

You may also want to read up on descriptors and other aspects of new
types; I wrote a tutorial:

  http://www.python.org/2.2.1/descrintro.html

> 2. In regard to the persistence API and especially in regard to
> observation, would someone please point out the pitfalls of using
> so-called "setter" and "getter" methods in attribute classes
> themselves, as opposed to __setattribute__ and __getattribute__
> methods in the container classes?

There is no __setattribute__; for historical reasons, there's
__getattr__, __setattr__, and __getattribute__.

If you have a few attributes that need special handling, and the rest
don't, implementing them using descriptors is much preferred, because
it doesn't slow down access to the other attributes.  OTOH, if you
need to trap *all* attributes (like Philip's Observable class),
__getattribute__ and __setattr__ are the only way.

> I am in no way proposing this as a general solution, but if in a
> given situation one wanted to set up a scheme similar to that coded
> below, what would be the major liabilities? Is it simply a matter
> of lack of transparency? Or is there also a serious problem with
> decreased efficiency?

I'm afraid I don't understand what your example code is trying to do.
It seems out of scope for this SIG.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido@python.org  Mon Jul 29 22:14:15 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 29 Jul 2002 17:14:15 -0400
Subject: [Persistence-sig] Was there a Persistence-BOF at OSCON?
Message-ID: <200207292114.g6TLEF406429@pcp02138704pcs.reston01.va.comcast.net>

So now that we're all safe back home, I'd like to hear what happened
at the Persistence-BOF at OSCON, if it was actually held.  (I was in a
different meeting that night, and very tired, so I didn't even attempt
to peek in.)

I should probably report on the persistence breakfast meeting: it
didn't happen, because Jim was delayed in Phoenix and the only two
people showing up for breakfast were Patrick O'Brien and me.  We
discussed mostly other things on our minds, like PythonCard.

I also note that, conform the predictions for any SIG, as soon as any
code was posted, the discussion stopped. :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From guido@python.org  Mon Jul 29 22:18:43 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 29 Jul 2002 17:18:43 -0400
Subject: [Persistence-sig] A simple Observation API
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
Message-ID: <200207292118.g6TLIhm06453@pcp02138704pcs.reston01.va.comcast.net>

Some questions about Phillip's Observable protocol.  Wby does it have
to be so complicated?  E.g. if you have to do something special for
method that touches an attribute without doing a setattr operation on
it, why not have the magic be inside that method rather than declare a
wrapper?  (The wrapper looks like it is much more expensive than
another way of flagging a change would be.)

What exactly is the point of collapsing multiple setattr() ops
together?  Just performance?  Or is there a semantic reason?  If just
performance, where is the time going that you're trying to save?

What's the use case for declaring a method as "touches an attribute
but that change should be ignored"?  (If it's only __init__, a
lighter-weight mechanism might be sufficient.)

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje@telecommunity.com  Mon Jul 29 22:16:40 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 29 Jul 2002 17:16:40 -0400
Subject: [Persistence-sig] Was there a Persistence-BOF at OSCON?
In-Reply-To: <200207292114.g6TLEF406429@pcp02138704pcs.reston01.va.comca
 st.net>
Message-ID: <3.0.5.32.20020729171640.0089b100@telecommunity.com>

At 05:14 PM 7/29/02 -0400, Guido van Rossum wrote:
>
>I also note that, conform the predictions for any SIG, as soon as any
>code was posted, the discussion stopped. :-)

I actually thought it was everybody but me went to OSCON.  :)

As for me, I was sick for a good part of last week, which is why I haven't
yet replied to Donal Walter's post about property-like objects.  (I've
actually got something similar in PEAK, although it's actually a way of
generating getters and setters...  a specialized kind of descriptor that
can add generated methods to the class it's contained it.  A bit esoteric
for this list's purposes though.)


From pje@telecommunity.com  Mon Jul 29 22:37:45 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 29 Jul 2002 17:37:45 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <200207292118.g6TLIhm06453@pcp02138704pcs.reston01.va.comca
 st.net>
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
Message-ID: <3.0.5.32.20020729173745.008a0240@telecommunity.com>

At 05:18 PM 7/29/02 -0400, Guido van Rossum wrote:
>Some questions about Phillip's Observable protocol.  Wby does it have
>to be so complicated?  E.g. if you have to do something special for
>method that touches an attribute without doing a setattr operation on
>it, why not have the magic be inside that method rather than declare a
>wrapper?  (The wrapper looks like it is much more expensive than
>another way of flagging a change would be.)

It's only for event compression, otherwise putting a simple flag operation
in the method would indeed be more lightweight.  Of course, I'm pretty sure
I could write a bytecode-hacking version that would recode the underlying
method to include the necessary wrapping code around its body, making it
just as fast as putting the code inline.  But I didn't want to put that
much effort into an example.  :)


>What exactly is the point of collapsing multiple setattr() ops
>together?  Just performance?  Or is there a semantic reason?  If just
>performance, where is the time going that you're trying to save?

Semantics plus performance.  The semantic part is that some "database"
systems (e.g. LDAP) inherently don't support transactions, AND must receive
a semantically valid set of attributes in a single update operation.  I may
be overgeneralizing this aspect, however.

The performance save is for situations like Tim Peters' distributed cache
example.  If a change notification is going to cause network traffic, it
would be a good idea to minimize the number of such notifications.  It's a
common situation (IMHO) to change multiple attributes in a set of related
methods, so this supports that scenario while ensuring a minimal set of
update events are issued.


>What's the use case for declaring a method as "touches an attribute
>but that change should be ignored"?  (If it's only __init__, a
>lighter-weight mechanism might be sufficient.)

I discovered the __init__ issue when I went to write the example code, and
adding an ignore list seemed like the simplest way to solve it quickly
without adding a metaclass or something else special to handle __init__.
Also, I know I've frequently written classes which do the bulk of their
attribute setup in methods other than __init__, and imagine others do as
well.  These days I use PEAK attribute binding descriptors that
automatically initialize attributes on first-use, instead, but I wrote the
Observable example assuming "plain-jane", "mainstream" Python with no
special metaclasses or the like.

In general, as to the features of the API, I wrote this mostly based on the
use cases that other folks had, although I'm certainly not against having
event compression.  :)  My own requirements in the API are only the
changeable "get" hook, and that notification of writes takes place after
the modifications.

The idea of using method wrappers to incorporate the metadata about what
attributes are modified, was an attempt to help mask implementation details
from the "naive" user.  It seemed to me a less invasive form of "dead
chicken waving", and also allowed for alternative implementation strategies
for the observable's internal mechanism.


From pobrien@orbtech.com  Mon Jul 29 22:45:42 2002
From: pobrien@orbtech.com (Patrick K. O'Brien)
Date: Mon, 29 Jul 2002 16:45:42 -0500
Subject: [Persistence-sig] Was there a Persistence-BOF at OSCON?
In-Reply-To: <200207292114.g6TLEF406429@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <NBBBIOJPGKJEKIECEMCBCEKHNJAA.pobrien@orbtech.com>

[Guido van Rossum]
>
> So now that we're all safe back home, I'd like to hear what happened
> at the Persistence-BOF at OSCON, if it was actually held.  (I was in a
> different meeting that night, and very tired, so I didn't even attempt
> to peek in.)
>
> I should probably report on the persistence breakfast meeting: it
> didn't happen, because Jim was delayed in Phoenix and the only two
> people showing up for breakfast were Patrick O'Brien and me.  We
> discussed mostly other things on our minds, like PythonCard.

I was afraid the same thing would happen again on Thursday, with Jim and I
as the only participants. (Of course, I would have enjoyed a one-on-one
conversation with the ZopePope to balance out the very nice one-on-one I got
to have with the Python BDFL.)

Thankfully, the Persistence BOF went much better than I feared. There were
about 8 of us in attendance, the discussion was productive, and we filled
the entire time from 8pm to 10pm. I took notes and intend to submit a mini
report to the list as soon as I get caught up with other items. Expect to
hear from me no later than the end of this week. I'm still optimistic that
this SIG can reach consensus and produce a useful Persistence foundation.

--
Patrick K. O'Brien
Orbtech
-----------------------------------------------
"Your source for Python programming expertise."
-----------------------------------------------
Web:  http://www.orbtech.com/web/pobrien/
Blog: http://www.orbtech.com/blog/pobrien/
Wiki: http://www.orbtech.com/wiki/PatrickOBrien
-----------------------------------------------


From guido@python.org  Mon Jul 29 22:56:53 2002
From: guido@python.org (Guido van Rossum)
Date: Mon, 29 Jul 2002 17:56:53 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: Your message of "Mon, 29 Jul 2002 17:37:45 EDT."
             <3.0.5.32.20020729173745.008a0240@telecommunity.com> 
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>  
            <3.0.5.32.20020729173745.008a0240@telecommunity.com> 
Message-ID: <200207292156.g6TLusI06618@pcp02138704pcs.reston01.va.comcast.net>

> At 05:18 PM 7/29/02 -0400, Guido van Rossum wrote:
> >Some questions about Phillip's Observable protocol.  Wby does it
> >have to be so complicated?  E.g. if you have to do something
> >special for method that touches an attribute without doing a
> >setattr operation on it, why not have the magic be inside that
> >method rather than declare a wrapper?  (The wrapper looks like it
> >is much more expensive than another way of flagging a change would
> >be.)
> 
> It's only for event compression, otherwise putting a simple flag
> operation in the method would indeed be more lightweight.  Of
> course, I'm pretty sure I could write a bytecode-hacking version
> that would recode the underlying method to include the necessary
> wrapping code around its body, making it just as fast as putting the
> code inline.  But I didn't want to put that much effort into an
> example.  :)

I hope you were really only joking.  Hacking bytecode is inexcusable
mixing of abstraction levels.

> >What exactly is the point of collapsing multiple setattr() ops
> >together?  Just performance?  Or is there a semantic reason?  If
> >just performance, where is the time going that you're trying to
> >save?
> 
> Semantics plus performance.  The semantic part is that some
> "database" systems (e.g. LDAP) inherently don't support
> transactions, AND must receive a semantically valid set of
> attributes in a single update operation.  I may be overgeneralizing
> this aspect, however.

I'm guessing that you'll have to do this differently anyway,
e.g. cache all changes and them force them out all at one with a
commit() operation.

> The performance save is for situations like Tim Peters' distributed
> cache example.  If a change notification is going to cause network
> traffic, it would be a good idea to minimize the number of such
> notifications.  It's a common situation (IMHO) to change multiple
> attributes in a set of related methods, so this supports that
> scenario while ensuring a minimal set of update events are issued.

Isn't there a way to do this in a less obtrusive way, e.g. by
buffering?  I don't know much of this application area, but the
mechanism you are proposing looks very heavy-handed.  I would expect
that in a realistic system, most methods would grow wrappers.  And
this *still* doesn't prevent bugs like updating a list attribute by
calling its append() method without somehow flagging this operation.

(Flagging changes at the method call level seems too course-grained.
What about a method that only occasionally makes a change to a given
attribute?)

> >What's the use case for declaring a method as "touches an attribute
> >but that change should be ignored"?  (If it's only __init__, a
> >lighter-weight mechanism might be sufficient.)
> 
> I discovered the __init__ issue when I went to write the example
> code, and adding an ignore list seemed like the simplest way to
> solve it quickly without adding a metaclass or something else
> special to handle __init__.  Also, I know I've frequently written
> classes which do the bulk of their attribute setup in methods other
> than __init__, and imagine others do as well.  These days I use PEAK
> attribute binding descriptors that automatically initialize
> attributes on first-use, instead, but I wrote the Observable example
> assuming "plain-jane", "mainstream" Python with no special
> metaclasses or the like.

That's good, because I have no idea what PEAK is. :-)

Anyway, if the special handling is mostly for __init__ (or things it
calls), then a metaclass could make the notation a bit prettier.

> In general, as to the features of the API, I wrote this mostly based
> on the use cases that other folks had, although I'm certainly not
> against having event compression.  :) My own requirements in the API
> are only the changeable "get" hook, and that notification of writes
> takes place after the modifications.
> 
> The idea of using method wrappers to incorporate the metadata about
> what attributes are modified, was an attempt to help mask
> implementation details from the "naive" user.  It seemed to me a
> less invasive form of "dead chicken waving", and also allowed for
> alternative implementation strategies for the observable's internal
> mechanism.

Maybe it's just too early to start proposing code?  Or has the
discussion already moved to IRC?  I'm curious about why the message
flow just stopped.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From pje@telecommunity.com  Mon Jul 29 23:20:03 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Mon, 29 Jul 2002 18:20:03 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <200207292156.g6TLusI06618@pcp02138704pcs.reston01.va.comca
 st.net>
References: <Your message of "Mon, 29 Jul 2002 17:37:45 EDT."
	<3.0.5.32.20020729173745.008a0240@telecommunity.com>
	<5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<3.0.5.32.20020729173745.008a0240@telecommunity.com>
Message-ID: <3.0.5.32.20020729182003.007d2d30@telecommunity.com>

At 05:56 PM 7/29/02 -0400, Guido van Rossum wrote:
>
>Hacking bytecode is inexcusable mixing of abstraction levels.

Huh?


>Isn't there a way to do this in a less obtrusive way, e.g. by
>buffering?  I don't know much of this application area, but the
>mechanism you are proposing looks very heavy-handed.  I would expect
>that in a realistic system, most methods would grow wrappers.  

Yep.  The problem with buffering is, if you're trying to allow for cascaded
storage, e.g. persisting to an XML document which is persisted in a
database...  You then end up having to have some kind of explicit ordering
that occurs between transaction participants.

Unfortunately, this application area in general is one where the total
amount of complexity can only be moved from one place to another, and not
actually reduced by much, at least if you're trying to maintain generality.
 :(  I was trying to keep a more or less even balance of complexity between
the persistent objects, the data managers, and the transaction object.


>And
>this *still* doesn't prevent bugs like updating a list attribute by
>calling its append() method without somehow flagging this operation.

Right.  If we could catch *that*, then there wouldn't be any need for a
special API in the first place!  :)

Of course, it could be done by having the setattr trap assignments of
mutables to attributes in the first place, and having the observer
subscribe to notifications from the mutable, with an annotation that it's
actually a modification to the "owner".  But this leads to a new set of
questions like "what's mutable?", and what kind of performance degradation
ensues if your normal practice is to keep re-assigning new values to a an
attribute of type "list".  :)

Oh, and let's not forget the overhead of un-tracking observer subscriptions
when the attribute is overwritten or deleted...  Ugh.

It seems that the only really *simple* way to address this issue "once and
for all" would be to disallow assignment of non-persistent objects to the
attributes of persistent objects, except for a small set of known immutable
types, such as numbers, strings, and tuples.  This could be trivially
trapped with an isinstance() check in setattr against say,
(int,str,unicode,float,complex,tuple,Persistent).  It would then be
impossible to make this kind of mistake...  unless of course you use a
single-element tuple containing a mutable...  

Argh!!!!



>Maybe it's just too early to start proposing code?  

I was actually trying to propose an API, not an implementation.  I
originally started trying to write text to explain the API, as I did with
my Transaction API proposal, but found it more difficult in this instance
than writing code. 

The ideas being proposed in the API were really just the event mechanism,
the getattr hook, metadata declaration, and event compression.


>Or has the discussion already moved to IRC?  

Eh?


>I'm curious about why the message flow just stopped.

I really did think it was OSCON.  A lot of the other posters (e.g. you,
Jim, and Jeremy) were gone this last week, yes?


From jeremy@alum.mit.edu  Tue Jul 30 01:23:29 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 29 Jul 2002 20:23:29 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020729173745.008a0240@telecommunity.com>
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<3.0.5.32.20020729173745.008a0240@telecommunity.com>
Message-ID: <15685.56449.750240.468121@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  [GvR asks the question that puzzles me too:]
  >> What exactly is the point of collapsing multiple setattr() ops
  >> together?  Just performance?  Or is there a semantic reason?  If
  >> just performance, where is the time going that you're trying to
  >> save?

  PJE> Semantics plus performance.  The semantic part is that some
  PJE> "database" systems (e.g. LDAP) inherently don't support
  PJE> transactions, AND must receive a semantically valid set of
  PJE> attributes in a single update operation.  I may be
  PJE> overgeneralizing this aspect, however.

  PJE> The performance save is for situations like Tim Peters'
  PJE> distributed cache example.  If a change notification is going
  PJE> to cause network traffic, it would be a good idea to minimize
  PJE> the number of such notifications.  It's a common situation
  PJE> (IMHO) to change multiple attributes in a set of related
  PJE> methods, so this supports that scenario while ensuring a
  PJE> minimal set of update events are issued.

I remain convinced that the current mechanism ought to work.  Perhaps
I just needed to be convinced otherwise, but I don't think these cases
are worked out in enough detail to be convincing.

I also think the semantics of the proposed alternative makes it harder
on the users, presumably in order to make the infrastructure's job
easier.  I'm thinking about a complex data structure implemented using
many helper methods.  If the data structure is modified inside a
helper message, it can't mark the object changed; it needs to wait for
the top-level operation to finish.  As a result, the data structure
would need to keep a separate flag to indicate whether it should be
marked as changed later.  Then the methods that are "top-level" needed
to be edited to check that flag and set _p_changed.  It's worse,
though, because you might want to implement one "top-level" operation
by calling another top-level operation.  That would require the
introduction of extra wrappers around the public versions of methods
that just do bookkeeping, so that the internal routines could call
other internal routines.

The complexity aside, I don't understand why the transaction framework
isn't sufficient to handle the two examples you mention above.  LDAP
does not support transactions, but does expect to get consistent
updates.  A transaction provides, among other things, the
consistency.  It should be possible to delay updates to the LDAP
database until the transaction commits.  The fact that LDAP does not
participate in two-phase commit limits its robustness, but should not
affect consistency.  (Specifically, I mean that a transaction may fail
in the final stage of the two-phase commit with this sort of data
manager.)

The distributed cache examples seems to be the same.  If there are
multiple udpates, delay sending any of the updates until the
transaction commits.  It might abort, after all, and then no updates
need to sent; this is just the atomic property of transactions.

The two examples seem to need the A and C of ACID transactions, so why
not use them?

Proper nested transactions should make the current mechanism even
cleaner.  Some methods of an object may want to have ACID semantics.
They can operate as a subtransaction, with all-or-nothing updates to
the object state provided that the top-level transaction commits.

I think a simple boolean flag, _p_changed, is all the change
notification we need when combined with transactions.

Jeremy




From jeremy@alum.mit.edu  Tue Jul 30 01:36:51 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Mon, 29 Jul 2002 20:36:51 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>
References: <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<3.0.5.32.20020719120237.00898b60@telecommunity.com>
	<200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <15685.57251.14632.949497@slothrop.zope.com>

Last week, I worked out a revised transaction API for user code and
for data managers.  It's implemented in ZODB4, but is fairly
preliminary code.  I imagine we'll revise it further, but I'd like to
describe the changes briefly.

Here's a short summary from the ZODB4/Doc/changes.txt document:

    The Transaction implementation has been completely overhauled.
    There are four significant changes that users may need to cope
    with.  The first is that a transaction that fails because of an
    uncaught exception is not aborted.  The user code should
    explicitly call get_transaction().abort().  The second is that
    commit() does not take an optional argument to flag subtransaction
    commits.  Instead, call the savepoint() method.  ZODB will return
    a rollback object from savepoint().  If the rollback object's
    rollback() method is called, it will abort the savepoint() --
    rolling back changes to the previous savepoint or the start of the
    transaction.
    
    The other changes to the Transaction implementation affect
    implementors of resource / data managers.  The ZODB Connection
    object is an example of a data manager.  When the persistence
    machinery detects that an object has been modified, the register()
    method is called on its data manager.  It's up to the data manager
    to register with the transaction.  The manager is registered with
    the transaction, not the individual objects.  The interface the
    data manager implements (IDataManager) has changed.  It should
    implement four methods: prepare(), abort(), commit(), and
    savepoint().  Here is how they correspond to the odl API:
    prepare() is roughly equivalent to tpc_begin() through tpc_vote().
    abort() and commit() are roughly equivalent to tpc_abort() and
    tpc_finish().  savepoint() is used for subtransactions.
    
The APIs look like this:

class ITransaction(Interface):
    """Transaction objects."""

    def abort():
        """Abort the current transaction."""

    def begin():
        """Begin a transaction."""

    def commit():
        """Commit a transaction."""

    def join(resource):
        """Join a resource manager to the current transaction."""

    def status():
        """Return status of the current transaction."""

class IDataManager(Interface):
    """Data management interface for storing objects transactionally."""

    def prepare(transaction):
        """Begin two-phase commit of a transaction.

        DataManager should return True or False.
        """
        
    def abort(transaction):
        """Abort changes made by transaction."""
    
    def commit(transaction):
        """Commit changes made by transaction."""

    def savepoint(transaction):
        """Do tentative commit of changes to this point.

        Should return an object implementing IRollback
        """
        
class IRollback(Interface):
    
    def rollback():
        """Rollback changes since savepoint."""

I think the rollback mechanism will work well enough.  Gray and Reuter
explain that it can be used to simulate a nested transaction
architecture.  Thus, I think it's a reasonable building block for the
nested transaction API.

I think I'm also in favor of the new abort semantics.  ZODB3 would
abort the transactions -- call abort() on all the data managers -- if
an error occurred during a commit.  The new code requires that the
user do this instead.  I think that's better, because it leaves the
state of the objects intact if the code wants to analyze what went
wrong before retrying the transaction.

Note that a Transaction doesn't have a register method.  Instead, a
modified object calls register() on its data manager.  The data
manager can join() that transaction if that's the right thing to do.
The ZODB Connection joins on the first register call of the
transaction.  However, I currently have join() on the transaction, not
the Transaction.Manager (aka TP monitor).

I'm in favor of sticking with register() as the persistent method,
although notify() would be okay, too.  I imagine that some data
managers would want to be notified when an object is read or written.
In that case, I'm not sure if notify() is enough; we might want a
notify method for each kind of event or a notify() method with the
event as an argument.

(The need for notify-on-read, BTW, is to support higher isolation
levels than ZODB currently supports.)

Jeremy



From shane@zope.com  Tue Jul 30 03:46:27 2002
From: shane@zope.com (Shane Hathaway)
Date: Mon, 29 Jul 2002 22:46:27 -0400 (EDT)
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <15685.57251.14632.949497@slothrop.zope.com>
Message-ID: <Pine.LNX.4.33L2.0207292226540.10948-100000@shane.zope.com>

On Mon, 29 Jul 2002, Jeremy Hylton wrote:

> Last week, I worked out a revised transaction API for user code and
> for data managers.  It's implemented in ZODB4, but is fairly
> preliminary code.  I imagine we'll revise it further, but I'd like to
> describe the changes briefly.

This is great work.

> (snip)
>
> The APIs look like this:
>
> class ITransaction(Interface):
>     """Transaction objects."""
>
>     def abort():
>         """Abort the current transaction."""
>
>     def begin():
>         """Begin a transaction."""
>
>     def commit():
>         """Commit a transaction."""
>
>     def join(resource):
>         """Join a resource manager to the current transaction."""

By "resource manager" do you mean "IDataManager"?

>
>     def status():
>         """Return status of the current transaction."""

What kind of object would status() return?  Who might make use of it?

Also, I'd like to see some way to set transaction metadata.

> class IDataManager(Interface):
>     """Data management interface for storing objects transactionally."""
>
>     def prepare(transaction):
>         """Begin two-phase commit of a transaction.
>
>         DataManager should return True or False.
>         """
>
>     def abort(transaction):
>         """Abort changes made by transaction."""
>
>     def commit(transaction):
>         """Commit changes made by transaction."""
>
>     def savepoint(transaction):
>         """Do tentative commit of changes to this point.
>
>         Should return an object implementing IRollback
>         """

I would like this interface to be called ITransactionParticipant.  There
are many interesting kinds of objects that would be interested in
participating in a transaction, and not all of them have the immediate
responsibility of storing data.  But the names you chose for the methods
are very clear and concise, I think.

> class IRollback(Interface):
>
>     def rollback():
>         """Rollback changes since savepoint."""
>
> I think the rollback mechanism will work well enough.  Gray and Reuter
> explain that it can be used to simulate a nested transaction
> architecture.  Thus, I think it's a reasonable building block for the
> nested transaction API.

Making rollback operations into objects is a little surprising, but as I
don't fully understand the ideas behind nested transactions, I'm sure
there's a reason for rollback objects to exist. :-)

> I think I'm also in favor of the new abort semantics.  ZODB3 would
> abort the transactions -- call abort() on all the data managers -- if
> an error occurred during a commit.  The new code requires that the
> user do this instead.  I think that's better, because it leaves the
> state of the objects intact if the code wants to analyze what went
> wrong before retrying the transaction.
>
> Note that a Transaction doesn't have a register method.  Instead, a
> modified object calls register() on its data manager.  The data
> manager can join() that transaction if that's the right thing to do.
> The ZODB Connection joins on the first register call of the
> transaction.  However, I currently have join() on the transaction, not
> the Transaction.Manager (aka TP monitor).
>
> I'm in favor of sticking with register() as the persistent method,
> although notify() would be okay, too.  I imagine that some data
> managers would want to be notified when an object is read or written.
> In that case, I'm not sure if notify() is enough; we might want a
> notify method for each kind of event or a notify() method with the
> event as an argument.

It seems to me that the data manager should register to receive specific
notifications.  Some data managers are only interested in knowing when an
object is moving from "ghost" to "saved" and from "saved" to "changed"
state (such as ZODB); others might want more events, like being notified
the first time an object is read in a transaction or receiving
notification of *every* attribute change.  Supporting the extra events in
C only incurs a speed penalty if the data manager requests those events.

Shane


From pje@telecommunity.com  Tue Jul 30 13:27:56 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 08:27:56 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <15685.56449.750240.468121@slothrop.zope.com>
References: <3.0.5.32.20020729173745.008a0240@telecommunity.com>
 <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
 <3.0.5.32.20020729173745.008a0240@telecommunity.com>
Message-ID: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>

At 08:23 PM 7/29/02 -0400, Jeremy Hylton wrote:
> >>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
>
>I remain convinced that the current mechanism ought to work.  Perhaps
>I just needed to be convinced otherwise, but I don't think these cases
>are worked out in enough detail to be convincing.

[shrug].  As I said, I was attempting to propose something that fit things 
brought up by others, and supply a generally-useful Observation framework, 
usable for things besides the context of persistence and 
transactions.  (Per Jim's suggestion that such an Observation framework 
would have greater motivation for a user to take advantage of it.)

For me, in the persistence/transaction context, the post-change flag is 
sufficient.


>I also think the semantics of the proposed alternative makes it harder
>on the users, presumably in order to make the infrastructure's job
>easier.  I'm thinking about a complex data structure implemented using
>many helper methods.  If the data structure is modified inside a
>helper message, it can't mark the object changed; it needs to wait for
>the top-level operation to finish.  As a result, the data structure
>would need to keep a separate flag to indicate whether it should be
>marked as changed later.  Then the methods that are "top-level" needed
>to be edited to check that flag and set _p_changed.  It's worse,
>though, because you might want to implement one "top-level" operation
>by calling another top-level operation.  That would require the
>introduction of extra wrappers around the public versions of methods
>that just do bookkeeping, so that the internal routines could call
>other internal routines.

Well, the example implementation I wrote took care of all of that, quite 
elegantly I thought.  But for my purposes, it's sufficient as long as 
_p_changed is set after the last modification that occurs.  It's okay if 
it's also set after previous modifications.  It just must be set after the 
last modification, regardless of how many other times it's set.

This requirement on my part has strictly to do with data managers that 
write to other data managers, in the context of the transaction API I proposed.


From jeremy@alum.mit.edu  Tue Jul 30 13:40:39 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Tue, 30 Jul 2002 08:40:39 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
References: <3.0.5.32.20020729173745.008a0240@telecommunity.com>
	<5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
Message-ID: <15686.35143.15684.228405@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  PJE> Well, the example implementation I wrote took care of all of
  PJE> that, quite elegantly I thought.  But for my purposes, it's
  PJE> sufficient as long as _p_changed is set after the last
  PJE> modification that occurs.  It's okay if it's also set after
  PJE> previous modifications.  It just must be set after the last
  PJE> modification, regardless of how many other times it's set.

  PJE> This requirement on my part has strictly to do with data
  PJE> managers that write to other data managers, in the context of
  PJE> the transaction API I proposed.

Can you explain how _p_changed is used outside of transaction control?
I still don't understand how the timing of _p_changed affects things.

Jeremy



From pje@telecommunity.com  Tue Jul 30 13:40:01 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 08:40:01 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <15685.57251.14632.949497@slothrop.zope.com>
References: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <3.0.5.32.20020719120237.00898b60@telecommunity.com>
 <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>
Message-ID: <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com>

At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote:
>Last week, I worked out a revised transaction API for user code and
>for data managers.  It's implemented in ZODB4, but is fairly
>preliminary code.  I imagine we'll revise it further, but I'd like to
>describe the changes briefly.

I'm not sure if this new API is in relation to the proposals on this list 
or not, but I'm curious how this affects a few things:

* The need for participants to join every transaction.  This is one of my 
top complaints about the existing API.  I have *never* had a single 
application where I couldn't simply register all participants to the 
transactions at or near startup, and never need to do so again -- if it 
weren't for the fact that the transaction API doesn't work that way.  I 
have to write code that tracks whether an object has been registered with 
*this* transaction, and knows when the transaction is over so that it knows 
it needs to register again.  Could we at least have a "permanent join" 
operation?

* Arbitarily nested, cascading participants.  Does this support 
them?  How?  I don't see any mention of the issues in the interfaces.

* If a data manager can't support rollback to a savepoint, what does it return?


 >(The need for notify-on-read, BTW, is to support higher isolation
 >levels than ZODB currently supports.)

And to support delayed loading of attributes by multi-backend data 
managers.  Although to support that, there'd need to be the opportunity to 
override the attribute value that was read.


From pje@telecommunity.com  Tue Jul 30 18:58:32 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 13:58:32 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <15686.35143.15684.228405@slothrop.zope.com>
References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
 <3.0.5.32.20020729173745.008a0240@telecommunity.com>
 <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
 <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
Message-ID: <3.0.5.32.20020730135832.008fa690@telecommunity.com>

At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote:
>>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
>
>  PJE> Well, the example implementation I wrote took care of all of
>  PJE> that, quite elegantly I thought.  But for my purposes, it's
>  PJE> sufficient as long as _p_changed is set after the last
>  PJE> modification that occurs.  It's okay if it's also set after
>  PJE> previous modifications.  It just must be set after the last
>  PJE> modification, regardless of how many other times it's set.
>
>  PJE> This requirement on my part has strictly to do with data
>  PJE> managers that write to other data managers, in the context of
>  PJE> the transaction API I proposed.
>
>Can you explain how _p_changed is used outside of transaction control?
>I still don't understand how the timing of _p_changed affects things.
>

This has to do with the "write-through mode" phase between
"prepareToCommit()" and "voteOnCommit()" messages (whatever you call them).
 During this phase, to support cascaded storage (one data manager writes to
another), all data managers must "write through" any changes that occur
*immediately*.  They can't wait for "prepareToCommit()", because they've
already received it.  Basically, when the object says, "I've changed"
(i.e. via "register" or "notify" or whatever you call it), the data manager
must write it out right then.

But, if the _p_changed flag is set *before* the change, the data manager
has no way to know what the change was and write it.  It can't wait for
"voteOnCommit()", because then the DM it writes to might have already
voted, for example.  It *must* know about the change as soon as the change
has occurred.  Thus, the change message must *follow* a change.  It's okay
if there are multiple change messages, as long as there's at least one
*after* a set of changes.

Now, you may say that there are other ways to address dependencies between
participants than having "write-through mode" during the prepare->vote
phase.  And you're right.  ZPatterns certainly manages to work around this,
as does Steve Alexander's TransactionAgents.  TransactionAgents, however,
is actually a partial rewrite of the Zope transaction machinery, and there
are some holes in how ZPatterns addresses the issue as well.  (ZPatterns
addresses it by adding more objects to the transaction during the
"commit()" calls to the data managers, that are roughly equivalent to the
current "prepare()" message concept.)

We could address this by having transaction participants declare their
dependencies to other participants, and have the transaction do a
topological sort, and send all messages in dependency order.  It could then
be an error to have a circular dependency, and data managers could raise an
error if they received an object change message once they were done with
the prepare() call.  It would make the Transaction API and implementation a
bit more complex, leave data managers about the same in complexity as they
would have been before, and it would mean that persistent objects wouldn't
need to worry about whether _p_changed was flagged before or after a change.

I proposed the direction I proposed, however, because it seemed to me
easier to require _p_changed to be after, than to make the transaction
manage a dependency graph.  Data managers will still have to keep track of
whether they've received a prepare() message, and do something special with
a change notification during that time, regardless of whether you manage
dependencies or have a "write-through" mode.

But, with explicit dependency management, DM's also have the extra overhead
of declaring their dependencies at registration, and they lose the ability
to "not know" who they depend on.  In other words, some
modularity/information hiding is lost if you can't have the data manager
delegate to functions or objects that know "how" to write the data, without
it having to know as well in order to do the registration.

Plus, had I proposed dependency management, I would be now defending
*that*, and I figured "_p_changed after" would be easier to justify.  :)
Perhaps I should have proposed dependency management instead, so that then
you could have said, "oh but we could solve that more easily if we just
made _p_changed be after instead of before", and then I would have said,
"Oh, of course, that's brilliant".  :)

All joking aside, I'm not married to either approach.  If you have
something that'll do it better than either way, or if I've somehow
overlooked a way in which this is already solved by the new ZODB4 API,
please let me know.


From shane@zope.com  Tue Jul 30 19:40:33 2002
From: shane@zope.com (Shane Hathaway)
Date: Tue, 30 Jul 2002 14:40:33 -0400 (EDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
Message-ID: <Pine.LNX.4.33L2.0207301415380.12642-100000@shane.zope.com>

On Tue, 30 Jul 2002, Phillip J. Eby wrote:

> At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote:
> >>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
> >
> >  PJE> Well, the example implementation I wrote took care of all of
> >  PJE> that, quite elegantly I thought.  But for my purposes, it's
> >  PJE> sufficient as long as _p_changed is set after the last
> >  PJE> modification that occurs.  It's okay if it's also set after
> >  PJE> previous modifications.  It just must be set after the last
> >  PJE> modification, regardless of how many other times it's set.
> >
> >  PJE> This requirement on my part has strictly to do with data
> >  PJE> managers that write to other data managers, in the context of
> >  PJE> the transaction API I proposed.
> >
> >Can you explain how _p_changed is used outside of transaction control?
> >I still don't understand how the timing of _p_changed affects things.
> >
>
> This has to do with the "write-through mode" phase between
> "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them).
>  During this phase, to support cascaded storage (one data manager writes to
> another), all data managers must "write through" any changes that occur
> *immediately*.  They can't wait for "prepareToCommit()", because they've
> already received it.  Basically, when the object says, "I've changed"
> (i.e. via "register" or "notify" or whatever you call it), the data manager
> must write it out right then.

I'm having trouble understanding this.  Is prepareToCommit() the first
phase, and voteOnCommit() the second phase?  Can't the data manager commit
the data on the second phase?

> But, if the _p_changed flag is set *before* the change, the data manager
> has no way to know what the change was and write it.  It can't wait for
> "voteOnCommit()", because then the DM it writes to might have already
> voted, for example.  It *must* know about the change as soon as the change
> has occurred.  Thus, the change message must *follow* a change.  It's okay
> if there are multiple change messages, as long as there's at least one
> *after* a set of changes.

For ZODB 3 I've realized that sometimes application code needs to set
_p_changed *before* making a change.  Here is an example of potentially
broken code:

def addDate(self, date):
    self.dates.append(date)  # self.dates is a simple list
    self.dates.sort()
    self._p_changed = 1

Let's say self.dates.sort() raises some exception that leads to an aborted
transaction.  Objects are supposed to be reverted on transaction abort,
but that won't happen here!  The connection was never notified that there
were changes, so self.dates is now out of sync.  But if the application
sets _p_changed just *before* the change, aborting will work.

> Now, you may say that there are other ways to address dependencies between
> participants than having "write-through mode" during the prepare->vote
> phase.  And you're right.  ZPatterns certainly manages to work around this,
> as does Steve Alexander's TransactionAgents.  TransactionAgents, however,
> is actually a partial rewrite of the Zope transaction machinery, and there
> are some holes in how ZPatterns addresses the issue as well.  (ZPatterns
> addresses it by adding more objects to the transaction during the
> "commit()" calls to the data managers, that are roughly equivalent to the
> current "prepare()" message concept.)
>
> We could address this by having transaction participants declare their
> dependencies to other participants, and have the transaction do a
> topological sort, and send all messages in dependency order.  It could then
> be an error to have a circular dependency, and data managers could raise an
> error if they received an object change message once they were done with
> the prepare() call.  It would make the Transaction API and implementation a
> bit more complex, leave data managers about the same in complexity as they
> would have been before, and it would mean that persistent objects wouldn't
> need to worry about whether _p_changed was flagged before or after a change.

Are you alluding to "indexing agents" and "rule agents" like we talked
about before?  I think we do need some kind of transaction participant
ordering to support those concepts.  I had in mind a simple numerical
prioritization scheme.  Is the need complex enough to require topological
sorting?

Shane


From pje@telecommunity.com  Tue Jul 30 20:05:39 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 15:05:39 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <Pine.LNX.4.33L2.0207301415380.12642-100000@shane.zope.com>
References: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
Message-ID: <3.0.5.32.20020730150539.0089c240@telecommunity.com>

At 02:40 PM 7/30/02 -0400, Shane Hathaway wrote:
>On Tue, 30 Jul 2002, Phillip J. Eby wrote:
>>
>> This has to do with the "write-through mode" phase between
>> "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them).
>>  During this phase, to support cascaded storage (one data manager writes to
>> another), all data managers must "write through" any changes that occur
>> *immediately*.  They can't wait for "prepareToCommit()", because they've
>> already received it.  Basically, when the object says, "I've changed"
>> (i.e. via "register" or "notify" or whatever you call it), the data manager
>> must write it out right then.
>
>I'm having trouble understanding this.  Is prepareToCommit() the first
>phase, and voteOnCommit() the second phase?  Can't the data manager commit
>the data on the second phase?

They're messages, not phases.  The phase is the period between messages.

Let's say we have DM1, DM2, and DM3, and the transaction calls:

DM2.prepare()
DM3.prepare()
DM1.prepare()

DM2.vote()
DM3.vote()
DM1.vote()

If DM1 writes to DM3, and DM3 writes to DM2, then this ordering doesn't
work, unless you have a "write-through" phase between prepare() and vote().
 That is, if DM3 goes into "write-through" mode when it receives prepare(),
then it will write through to DM2 when DM1 writes to it during the
DM1.prepare() method.


>> But, if the _p_changed flag is set *before* the change, the data manager
>> has no way to know what the change was and write it.  It can't wait for
>> "voteOnCommit()", because then the DM it writes to might have already
>> voted, for example.  It *must* know about the change as soon as the change
>> has occurred.  Thus, the change message must *follow* a change.  It's okay
>> if there are multiple change messages, as long as there's at least one
>> *after* a set of changes.
>
>For ZODB 3 I've realized that sometimes application code needs to set
>_p_changed *before* making a change.  Here is an example of potentially
>broken code:
>
>def addDate(self, date):
>    self.dates.append(date)  # self.dates is a simple list
>    self.dates.sort()
>    self._p_changed = 1
>
>Let's say self.dates.sort() raises some exception that leads to an aborted
>transaction.  Objects are supposed to be reverted on transaction abort,
>but that won't happen here!  The connection was never notified that there
>were changes, so self.dates is now out of sync.  But if the application
>sets _p_changed just *before* the change, aborting will work.

Good point.  I hadn't really thought about that use case.  But the
Observation API I proposed does support it, via separate
beforeChange()/afterChange() notifications.  A DM could track
beforeChange() to know that an object needs rolling back, and
afterChange(), to actually send a change through to an underlying DB, if
it's in write-through mode at the time.


>> Now, you may say that there are other ways to address dependencies between
>> participants than having "write-through mode" during the prepare->vote
>> phase.  And you're right.  ZPatterns certainly manages to work around this,
>> as does Steve Alexander's TransactionAgents.  TransactionAgents, however,
>> is actually a partial rewrite of the Zope transaction machinery, and there
>> are some holes in how ZPatterns addresses the issue as well.  (ZPatterns
>> addresses it by adding more objects to the transaction during the
>> "commit()" calls to the data managers, that are roughly equivalent to the
>> current "prepare()" message concept.)
>>
>> We could address this by having transaction participants declare their
>> dependencies to other participants, and have the transaction do a
>> topological sort, and send all messages in dependency order.  It could then
>> be an error to have a circular dependency, and data managers could raise an
>> error if they received an object change message once they were done with
>> the prepare() call.  It would make the Transaction API and implementation a
>> bit more complex, leave data managers about the same in complexity as they
>> would have been before, and it would mean that persistent objects wouldn't
>> need to worry about whether _p_changed was flagged before or after a
change.
>
>Are you alluding to "indexing agents" and "rule agents" like we talked
>about before?  

That's what TransactionAgents does, but that's not what I'm looking for per
se.  I'm looking at simple data managers.  For example, if I make a data
manager that persists a set of objects to an XML DOM, I might want to use
it with a DOM persistence manager that stores XML documents in an SQL
database.  All three "data managers" (persist->XML, XML->Database, SQL
database) are transaction participants, with implied or actual ordering.


>I think we do need some kind of transaction participant
>ordering to support those concepts.  I had in mind a simple numerical
>prioritization scheme.  Is the need complex enough to require topological
>sorting?

Numerical prioritization requires that you have global knowledge of the
participants, and therefore seems to go against modular usage of
components, such as in my example above.

Certainly, any non-circular topological relationship can be reduced to a
numerical ordering.  After all, Python new-style classes do it in __mro__.
A topological sort using the kjbuckets module is maybe 30-40 lines of
Python code, however; not much to pay, IMHO, for the amount of debugging
saved by those people who would otherwise be tearing their hair out trying
to figure out why something is intermittently failing because they gave two
items the same numerical priority, but sometimes one of them is going first
and sometimes the other one is.

The post-change flag approach I proposed has the advantage of determining
dependencies dynamically; that is, only dependencies that actually exist
will have an effect, and explicit management through priorities or
dependencies isn't required.  In terms of API, I'd much rather deal with
the overhead of before/after change notifications (as in my suggested
Observation API) than have to explicitly declare priorities or
dependencies.  I can much more easily verify (by testing or local code
inspection) that my object obeys the observation API, than I can debug
*global* and *dynamic* interaction dependencies.

So in my opinion, I'd *much* rather put up with the wrapper overhead on
write methods, than deal with the global debug nightmares that declaring
dependencies or priorities between data managers is (again, in my opinion)
likely to bring.  Such issues are harder for novice developers to
understand.  If their class works correctly, they reason, so too should my
application.  All the components worked individually, why won't they work
together?  IMO, the principle of least surprise says they should just work,
without needing to wave any additional dead chickens over the code.


From guido@python.org  Tue Jul 30 20:11:21 2002
From: guido@python.org (Guido van Rossum)
Date: Tue, 30 Jul 2002 15:11:21 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: Your message of "Mon, 29 Jul 2002 18:20:03 EDT."
             <3.0.5.32.20020729182003.007d2d30@telecommunity.com> 
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<3.0.5.32.20020729173745.008a0240@telecommunity.com>  
	<3.0.5.32.20020729182003.007d2d30@telecommunity.com> 
Message-ID: <200207301911.g6UJBLe19703@odiug.zope.com>

> >Hacking bytecode is inexcusable mixing of abstraction levels.
> 
> Huh?

The bytecode spec is not part of the Python spec, it's an
implementation detail.  Jython, e.g., doesn't use bytecode.  Neither
do various systems that translate Python source code to C or machine
code.

Unfortunately I'm going to have to pull out of this thread -- I'm so
far behind on my email that I can't afford trying to understand this
discussion.

--Guido van Rossum (home page: http://www.python.org/~guido/)

From Sebastien.Bigaret@inqual.com  Tue Jul 30 20:34:13 2002
From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret)
Date: 30 Jul 2002 21:34:13 +0200
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: "Phillip J. Eby"'s message of "Tue, 30 Jul 2002 13:58:32 -0400"
References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
	<3.0.5.32.20020729173745.008a0240@telecommunity.com>
	<5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
	<3.0.5.32.20020730135832.008fa690@telecommunity.com>
Message-ID: <878z3t57t6.fsf@bidibule.brest.inqual.bzh>


> At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote:
> >>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
> >
> >  PJE> Well, the example implementation I wrote took care of all of
> >  PJE> that, quite elegantly I thought.  But for my purposes, it's
> >  PJE> sufficient as long as _p_changed is set after the last
> >  PJE> modification that occurs.  It's okay if it's also set after
> >  PJE> previous modifications.  It just must be set after the last
> >  PJE> modification, regardless of how many other times it's set.
> >
> >  PJE> This requirement on my part has strictly to do with data
> >  PJE> managers that write to other data managers, in the context of
> >  PJE> the transaction API I proposed.
> >
> >Can you explain how _p_changed is used outside of transaction control?
> >I still don't understand how the timing of _p_changed affects things.
> >
> 
> This has to do with the "write-through mode" phase between
> "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them).
>  During this phase, to support cascaded storage (one data manager writes to
> another), all data managers must "write through" any changes that occur
> *immediately*.  They can't wait for "prepareToCommit()", because they've
> already received it.  Basically, when the object says, "I've changed"
> (i.e. via "register" or "notify" or whatever you call it), the data manager
> must write it out right then.

I'd like to add a few words here, saying that cascaded storage is not the only
case where "write-through" mode is involved: the so-called 'cascade', i.e. one
DM writing to a lower-level one, can be ``transverse'' as well, i.e. one DM
writing to another one, at the same 'level'. Just an example here: say you
have DM1 and DM2 being responsible for RDBMS DB1 and DB2. If obj1 and obj2 are
to be stored within, resp., DB1 and DB2, then you can have that sort of
``write-through'' mechanism being triggered as well. The reason for this is
that, if obj1 and obj2 are in relation with each other, and since informations
needed for relationships are mostly stored in RDBMS in an asymetrical manner
(put it simply: this info==a foreign key, stored in only one of the two
tables), a change in one of the object needs to be forwarded to the other DM.

  Humm...

  Having writing this, I'm not sure this is related to what you're saying
  here, mainly because the forwarded informations I'm talking about is *not*
  in the object's properties... or is it? Well, changes are *not* in the
  original obj1's properties (although changes might be propagated bottom-up,
  but that's another story), but changes are made in the corresponding 'row1'
  's properties (at DM1 level). So, if DM1 is already in write-through mode,
  it will in turn immediately notify/write to its
  SQL-database-connection-DM. We will potentially have more than one SQL
  statement issued for a single row/whatever, but the necessary informations
  about the whole architecture and dependencies (which the DMs do know) do not
  have to be put into the Transaction framework.

  If this is it, it makes me think that it is like having the DMs calling a
  (reentrant) version of 'prepareCommit()' on their level-1 DMs --but the
  actual forwarding of the message is not explicit, rather made implicit
  through the 'write-through' mode.

Is this what you mean?

-- Sebastien.


From pje@telecommunity.com  Tue Jul 30 20:55:15 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 15:55:15 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <878z3t57t6.fsf@bidibule.brest.inqual.bzh>
References: <"Phillip J. Eby"'s message of "Tue, 30 Jul 2002 13:58:32 -0400">
 <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
 <3.0.5.32.20020729173745.008a0240@telecommunity.com>
 <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
 <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
 <3.0.5.32.20020730135832.008fa690@telecommunity.com>
Message-ID: <3.0.5.32.20020730155515.0089ea00@telecommunity.com>

At 09:34 PM 7/30/02 +0200, Sebastien Bigaret wrote:
>
>I'd like to add a few words here, saying that cascaded storage is not the
only
>case where "write-through" mode is involved: the so-called 'cascade', i.e.
one
>DM writing to a lower-level one, can be ``transverse'' as well, i.e. one DM
>writing to another one, at the same 'level'. Just an example here: say you
>have DM1 and DM2 being responsible for RDBMS DB1 and DB2. If obj1 and obj2
are
>to be stored within, resp., DB1 and DB2, then you can have that sort of
>``write-through'' mechanism being triggered as well. The reason for this is
>that, if obj1 and obj2 are in relation with each other, and since
informations
>needed for relationships are mostly stored in RDBMS in an asymetrical manner
>(put it simply: this info==a foreign key, stored in only one of the two
>tables), a change in one of the object needs to be forwarded to the other DM.
>
>  Humm...
>
>  Having writing this, I'm not sure this is related to what you're saying
>  here, mainly because the forwarded informations I'm talking about is *not*
>  in the object's properties... or is it? Well, changes are *not* in the
>  original obj1's properties (although changes might be propagated bottom-up,
>  but that's another story), but changes are made in the corresponding 'row1'
>  's properties (at DM1 level). So, if DM1 is already in write-through mode,
>  it will in turn immediately notify/write to its
>  SQL-database-connection-DM. We will potentially have more than one SQL
>  statement issued for a single row/whatever, but the necessary informations
>  about the whole architecture and dependencies (which the DMs do know) do
not
>  have to be put into the Transaction framework.
>
>  If this is it, it makes me think that it is like having the DMs calling a
>  (reentrant) version of 'prepareCommit()' on their level-1 DMs --but the
>  actual forwarding of the message is not explicit, rather made implicit
>  through the 'write-through' mode.
>
>Is this what you mean?
>

Yes, if I've understood you correctly.  My point was that it's easier to
implement scenarios such as you described, with the "write-throughs during
commit" algorithm, as it doesn't need to explicitly track all those
dependencies.  Yes, it may cause occasional inefficienct write operations
when there is a complex cascade taking place, and the participants are
registered in a less-than-optimal order, but the idea is for "complex
things to be possible", while keeping simple things simple, and ideally to
guarantee correctness.

That's why, in the absence of other information to the contrary, I favor
the "write-throughs during commit" algorithm for handling dependencies.  It
scales the best for complex scenarios, guarantees correctness for any
non-circular dependency graph, and involves the least code to be written
for even the simplest cases, with the possible exception of how persistent
objects issue change notifications.


From shane@zope.com  Tue Jul 30 21:02:28 2002
From: shane@zope.com (Shane Hathaway)
Date: Tue, 30 Jul 2002 16:02:28 -0400 (EDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <Pine.LNX.4.33L2.0207301532320.12642-100000@shane.zope.com>

On Tue, 30 Jul 2002, Phillip J. Eby wrote:

> They're messages, not phases.  The phase is the period between messages.

Yep. :-)

> Let's say we have DM1, DM2, and DM3, and the transaction calls:
>
> DM2.prepare()
> DM3.prepare()
> DM1.prepare()
>
> DM2.vote()
> DM3.vote()
> DM1.vote()
>
> If DM1 writes to DM3, and DM3 writes to DM2, then this ordering doesn't
> work, unless you have a "write-through" phase between prepare() and vote().
>  That is, if DM3 goes into "write-through" mode when it receives prepare(),
> then it will write through to DM2 when DM1 writes to it during the
> DM1.prepare() method.

I see now.  From one perspective, this problem is a side effect of keeping
transaction participants registered between transactions, as you've been
suggesting.  ZODB 3's transaction manager would normally have no problem
with this, since DM3 and DM2 would only get added to the transaction once
DM1 started writing.  The implicit order would solve the problem.

Unfortunately, this solution has a weakness--if some other data manager
wrote unrelated data to DM3 or DM2 before DM1 wrote its data, the implicit
order would be incorrect.  Thus the need for transaction agents, which
guarantee a specific order (if I recall correctly).

Write-through mode seems like a performance killer for many applications.
What about this: transaction participants could tell the transaction that
even though their prepare() method has been called already, they need it
called again.

Shane


From pje@telecommunity.com  Tue Jul 30 21:17:54 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 16:17:54 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <Pine.LNX.4.33L2.0207301532320.12642-100000@shane.zope.com>
References: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <3.0.5.32.20020730161754.008701c0@telecommunity.com>

At 04:02 PM 7/30/02 -0400, Shane Hathaway wrote:
>
>I see now.  From one perspective, this problem is a side effect of keeping
>transaction participants registered between transactions, as you've been
>suggesting.  ZODB 3's transaction manager would normally have no problem
>with this, since DM3 and DM2 would only get added to the transaction once
>DM1 started writing.  The implicit order would solve the problem.
>
>Unfortunately, this solution has a weakness--if some other data manager
>wrote unrelated data to DM3 or DM2 before DM1 wrote its data, the implicit
>order would be incorrect.  Thus the need for transaction agents, which
>guarantee a specific order (if I recall correctly).

Right; this is why I described both ZPatterns and TransactionAgents as
being hacks.  We rely on the use of registration order to force things to
work in most cases, or re-registration.  It's a pretty ugly hack.


>Write-through mode seems like a performance killer for many applications.
>What about this: transaction participants could tell the transaction that
>even though their prepare() method has been called already, they need it
>called again.

The only drawback I'm aware of for that approach, is that it leads to an
infinite loop instead of a stack overflow, in the event you accidentally
create a circular dependency graph.  The infinite loop doesn't produce a
traceback, and thus doesn't show you *how* you created the circularity.

I suppose you could require that the number of times you loop through a
list sending prepare() calls is no greater than some multiplier of the
total number of participants, and then at least you could detect what seems
like a runaway dependency.  Printing out what the loop *was* could be hard,
though, and the information would not show you as directly how the loop
occurred.  But I'm willing to bend on this point, since I think even
accidental circularity is likely to be rare, that when it does occur you're
likely to have known there was a risk of it, and that you'll be likely to
know where to look for where it occurred.  It's a lot different than the
risk of out-of-order commits, which could occur with explicit dependency
management for even very simple scenarios.

Also, I think a different method should be used for the second prepare()
call - perhaps a flush() method.  That way, prepare() won't need to be able
to be called twice during the same commit, which I can see some problems
with.  prepare() could simply call flush(), or perhaps the transaction
could do it.  flush() should be written so as to be usable at any point in
the transaction, since it'll presumably be used to implement savepoints as
well, and in some cases to ensure an underlying DB is up-to-date before
performing a query.

I do like the simplification of not needing a "write-through" mode,
although in reality all we are doing is replacing it with a "re-flush"
mode.  That is, once a participant receives prepare(), it must respond to
any future change notifications by requesting a re-call of flush() by the
transaction.

By the way, I'd still like to have the option of having participants join a
transaction "permanently", in order to avoid all of the state management
code that such things currently require.

With the exception of the above issues, I'm good with this approach.
Brilliant idea, Shane.  :)


From shane@zope.com  Tue Jul 30 21:55:47 2002
From: shane@zope.com (Shane Hathaway)
Date: Tue, 30 Jul 2002 16:55:47 -0400 (EDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020730161754.008701c0@telecommunity.com>
Message-ID: <Pine.LNX.4.33L2.0207301634450.12642-100000@shane.zope.com>

On Tue, 30 Jul 2002, Phillip J. Eby wrote:

> At 04:02 PM 7/30/02 -0400, Shane Hathaway wrote:
>
> >Write-through mode seems like a performance killer for many applications.
> >What about this: transaction participants could tell the transaction that
> >even though their prepare() method has been called already, they need it
> >called again.
>
> The only drawback I'm aware of for that approach, is that it leads to an
> infinite loop instead of a stack overflow, in the event you accidentally
> create a circular dependency graph.  The infinite loop doesn't produce a
> traceback, and thus doesn't show you *how* you created the circularity.

Good point.  OTOH, from my own experience, stack overflows in Python
sometimes lead to segfaults, and I'd prefer an infinite loop over a
segfault. :-)

> (snip)
>
> Also, I think a different method should be used for the second prepare()
> call - perhaps a flush() method.  That way, prepare() won't need to be able
> to be called twice during the same commit, which I can see some problems
> with.  prepare() could simply call flush(), or perhaps the transaction
> could do it.  flush() should be written so as to be usable at any point in
> the transaction, since it'll presumably be used to implement savepoints as
> well, and in some cases to ensure an underlying DB is up-to-date before
> performing a query.

Yes, flush() is a good idea.  It keeps the phase change distinct from the
repeatable messages, and its purpose would be well understood.

> I do like the simplification of not needing a "write-through" mode,
> although in reality all we are doing is replacing it with a "re-flush"
> mode.  That is, once a participant receives prepare(), it must respond to
> any future change notifications by requesting a re-call of flush() by the
> transaction.
>
> By the way, I'd still like to have the option of having participants join a
> transaction "permanently", in order to avoid all of the state management
> code that such things currently require.

Yes, that sounds useful for logging, periodic backups (to ensure the
backup is based on fully committed data), and other utilities.  As long as
joining permanently is optional, since objects like CommitVersions don't
need to stick around.

Now, I wonder about multithreaded apps.  If you join a transaction
permanently, do you join all threads?  At first I wasn't thinking you
would, but on further reflection, it seems like that's what you'd want.
And how would this affect CORBA (since, from what I hear, its transactions
are not bound to threads)?

> With the exception of the above issues, I'm good with this approach.
> Brilliant idea, Shane.  :)

Thanks.  You too.

Shane


From Sebastien.Bigaret@inqual.com  Wed Jul 31 00:35:09 2002
From: Sebastien.Bigaret@inqual.com (Sebastien Bigaret)
Date: 31 Jul 2002 01:35:09 +0200
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: Shane Hathaway's message of "Tue, 30 Jul 2002 16:55:47 -0400
	(EDT)"
References: <Pine.LNX.4.33L2.0207301634450.12642-100000@shane.zope.com>
Message-ID: <87d6t4zt5e.fsf@inqual.com>


> Phillip> (snip)
> Phillip> Also, I think a different method should be used for the
> Phillip> second prepare() call - perhaps a flush() method.  That way,
> Phillip> prepare() won't need to be able to be called twice during the
> Phillip> same commit, which I can see some problems with.  prepare()
> Phillip> could simply call flush(), or perhaps the transaction could
> Phillip> do it.  flush() should be written so as to be usable at any
> Phillip> point in the transaction, since it'll presumably be used to
> Phillip> implement savepoints as well, and in some cases to ensure an
> Phillip> underlying DB is up-to-date before performing a query.

Shane> Yes, flush() is a good idea.  It keeps the phase change
Shane> distinct from the repeatable messages, and its purpose would be
Shane> well understood.

+1, this sounds good.

Shane> Now, I wonder about multithreaded apps.  If you join a
Shane> transaction permanently, do you join all threads?  At first I
Shane> wasn't thinking you would, but on further reflection, it seems
Shane> like that's what you'd want.  And how would this affect CORBA
Shane> (since, from what I hear, its transactions are not bound to
Shane> threads)?

Could you be more explicit? It seems strange to me. For me and the
applications I usually address, DataManagers are most of the time
bound to a ``session'' idea (just like Sessions in Zope or in any
http-based app., i.e. a set of objects being modified by subsequent
requests from users until it reaches the point where it needs to be
made persistent, independently from each other).

  --> What do you think of making it possible for DM-factories to
permanently join transactions, so that it is possible to do whatever
can be accurate for a given situation/application ? (e.g. w/ factories
returning a singleton if you want to join all threads, or a
session-specific DM, or thread-specific DM if you need to, etc.)

  NB: just before posting I have a doubt on what you're actually
  talking about. I know I'm mixing MT and sessions in a unreasonable
  manner hereabove, but the point is that I'm basically thinking
  'joining' in terms of 'initialization of a transaction's
  participants'. Maybe I misunderstood the whole stuff here. Philipp
  wrote about this:

Phillip> * The need for participants to join every transaction.  This
Phillip> is one of my top complaints about the existing API.  I have
Phillip> *never* had a single application where I couldn't simply
Phillip> register all participants to the transactions at or near
Phillip> startup, and never need to do so again -- if it weren't for
Phillip> the fact that the transaction API doesn't work that way.  I
Phillip> have to write code that tracks whether an object has been
Phillip> registered with *this* transaction, and knows when the
Phillip> transaction is over so that it knows it needs to register
Phillip> again

  ...I can't decide whether you are talking about initialization of a
  transaction _instance_. The last sentence suggests that participants
  are unregistered when the transaction closes: do you mean destroyed,
  or commit/rollback time? If this is the latter case, then I guess I
  have missed something, since I cannot find any references in the
  previous threads about participants being unregistered at that
  point. If this is the first case (hence, making it possible to
  generate a given set of DataManagers for each new transaction), then
  my proposal for DM-factories might be meaningful.

-- Sebastien.


From pje@telecommunity.com  Wed Jul 31 00:53:20 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 19:53:20 -0400
Subject: [Persistence-sig] Threads and transactions (was
  Observation API)
In-Reply-To: <Pine.LNX.4.33L2.0207301634450.12642-100000@shane.zope.com>
References: <3.0.5.32.20020730161754.008701c0@telecommunity.com>
Message-ID: <5.1.0.14.0.20020730194212.05d95460@mail.telecommunity.com>

At 04:55 PM 7/30/02 -0400, Shane Hathaway wrote:
>On Tue, 30 Jul 2002, Phillip J. Eby wrote:
> >
> > By the way, I'd still like to have the option of having participants join a
> > transaction "permanently", in order to avoid all of the state management
> > code that such things currently require.
>
>Yes, that sounds useful for logging, periodic backups (to ensure the
>backup is based on fully committed data), and other utilities.  As long as
>joining permanently is optional, since objects like CommitVersions don't
>need to stick around.

Also, in my use cases, certain caches want to clear themselves on 
transactional boundaries.


>Now, I wonder about multithreaded apps.  If you join a transaction
>permanently, do you join all threads?  At first I wasn't thinking you
>would, but on further reflection, it seems like that's what you'd want.
>And how would this affect CORBA (since, from what I hear, its transactions
>are not bound to threads)?

A permanent join should be to *that* transaction object only.  Anything 
else implies too much policy, IMHO.  For my use cases, I will normally have 
at most one transaction per thread.  This is the normal use case for Zope 
also, I believe.  In the event that you have an object which can safely 
participate in multiple transactions simultaneously, then by all means you 
should be able to register it with them.

I think that the transaction API *should* provide at least some nominal 
support for associating a transaction with a thread, or automatically 
creating per-thread transactions, if only because ZODB has supported that 
in the past.  Java's JTA also assumes that the default use case is 
transaction-per-thread.

But, the minimum I would like to see is that a transaction object should be 
reusable over and over.  As long as that's the case, a permanent join is 
useful, since I can declare a transaction object, associate things with it, 
and proceed about my business.

I think most people, however, will want to be able to do something similar 
to ZODB's existing "get_transaction()" function to get a singleton 
Transaction object, to do basic single-threaded, single-transaction 
applications.  And simple things should be simple.


From pje@telecommunity.com  Wed Jul 31 00:57:36 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Tue, 30 Jul 2002 19:57:36 -0400
Subject: [Persistence-sig] Clearing participants (was Observation
  API)
In-Reply-To: <87d6t4zt5e.fsf@inqual.com>
References: <Shane Hathaway's message of "Tue, 30 Jul 2002 16:55:47 -0400
	(EDT)"><Pine.LNX.4.33L2.0207301634450.12642-100000@shane.zope.com>
Message-ID: <5.1.0.14.0.20020730195406.0267fdd0@mail.telecommunity.com>

At 01:35 AM 7/31/02 +0200, Sebastien Bigaret wrote:

>Phillip> * The need for participants to join every transaction.  This
>Phillip> is one of my top complaints about the existing API.  I have
>Phillip> *never* had a single application where I couldn't simply
>Phillip> register all participants to the transactions at or near
>Phillip> startup, and never need to do so again -- if it weren't for
>Phillip> the fact that the transaction API doesn't work that way.  I
>Phillip> have to write code that tracks whether an object has been
>Phillip> registered with *this* transaction, and knows when the
>Phillip> transaction is over so that it knows it needs to register
>Phillip> again
>
>   ...I can't decide whether you are talking about initialization of a
>   transaction _instance_. The last sentence suggests that participants
>   are unregistered when the transaction closes: do you mean destroyed,
>   or commit/rollback time? If this is the latter case, then I guess I
>   have missed something, since I cannot find any references in the
>   previous threads about participants being unregistered at that
>   point. If this is the first case (hence, making it possible to
>   generate a given set of DataManagers for each new transaction), then
>   my proposal for DM-factories might be meaningful.

Sorry, the "transaction API" and "existing API" I referred to is the 
currently available transaction API in Zope/ZODB, not the API I proposed on 
this list.  The old Zope/ZODB transaction API requires registration for 
each transaction lifecycle; the registration list is cleared upon every 
commit or rollback.

My motivation for making registration permanent in my "Straw Man" 
transaction API proposal was to counteract this.  In the API Shane and I 
are discussing, there would be an option to register with a transaction 
instance such that the registration would remain across commit/rollback 
boundaries.


From shane@zope.com  Wed Jul 31 03:26:33 2002
From: shane@zope.com (Shane Hathaway)
Date: Tue, 30 Jul 2002 22:26:33 -0400 (EDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <87d6t4zt5e.fsf@inqual.com>
Message-ID: <Pine.LNX.4.33L2.0207302138340.13371-100000@shane.zope.com>

On 31 Jul 2002, Sebastien Bigaret wrote:

>   ...I can't decide whether you are talking about initialization of a
>   transaction _instance_. The last sentence suggests that participants
>   are unregistered when the transaction closes: do you mean destroyed,
>   or commit/rollback time? If this is the latter case, then I guess I
>   have missed something, since I cannot find any references in the
>   previous threads about participants being unregistered at that
>   point. If this is the first case (hence, making it possible to
>   generate a given set of DataManagers for each new transaction), then
>   my proposal for DM-factories might be meaningful.

The terminology we're using is a little confusing, since an object that is
truly a transaction should probably begin its life at the beginning of a
transaction and, at commit or rollback time, should become permanently
immutable.  It might even be stored in the database.

But the things we've been calling transactions play a role more like
transaction "coordinators".  As coordinators, they might be reused for
numerous non-overlapping transactions.  If they are reused, it makes
sense to be able to register a permanent transaction participant with a
specific coordinator.

I think there might a problem, though.  ZODB customarily uses one
transaction coordinator per thread.  But ZODB connections are not really
thread-specific; they may be reused in a different thread when they are
opened or closed.  So if, for example, you registered a permanent
transaction participant that cleared the cache of a specific ZODB
connection, you wouldn't get the effect you wanted! :-)

That's why I suggested that if you want permanent participants, that
perhaps you'd really want to register the transaction participant for all
threads.  It requires you to consider thread safety, but I think you'd
frequently have to consider that anyway.

Shane


From jeremy@alum.mit.edu  Wed Jul 31 03:51:13 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Tue, 30 Jul 2002 22:51:13 -0400
Subject: [Persistence-sig] Clearing participants (was Observation
  API)
In-Reply-To: <5.1.0.14.0.20020730195406.0267fdd0@mail.telecommunity.com>
References: <Shane Hathaway's message of "Tue, 30 Jul 2002 16:55:47 -0400
	(EDT)">
	<Pine.LNX.4.33L2.0207301634450.12642-100000@shane.zope.com>
	<5.1.0.14.0.20020730195406.0267fdd0@mail.telecommunity.com>
Message-ID: <15687.20641.456885.879684@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  PJE> My motivation for making registration permanent in my "Straw
  PJE> Man" transaction API proposal was to counteract this.  In the
  PJE> API Shane and I are discussing, there would be an option to
  PJE> register with a transaction instance such that the registration
  PJE> would remain across commit/rollback boundaries.

I don't think it makes sense to talk about a single transaction that
spans multiple commits.  A transaction ends with a commit or an abort.
If you do something after that, it's a different transaction.

I agree, however, that it is worth discussing 1) what mechanisms are
needed for associating threads with transactions in order to support a
range of policies and 2) how a resource manager can express its
interest in all (some?) transactions.  The second issue probably
depends on the first, but not vice versa.

Jeremy


From iiourov@yahoo.com  Wed Jul 31 07:32:40 2002
From: iiourov@yahoo.com (Ilia Iourovitski)
Date: Tue, 30 Jul 2002 23:32:40 -0700 (PDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <Pine.LNX.4.33L2.0207302138340.13371-100000@shane.zope.com>
Message-ID: <20020731063240.55579.qmail@web20705.mail.yahoo.com>

If object participate in more than one transaction
concurrently, transaction API shall provide locks.
It should be possible to acquire read/write lock in
the 
same fashion as RDBMS let client lock row 
using select for update.
In number of cases without locks pure transactions
doesn't garantee concurrency control without problem.
Typical example is user account balance wich can be
updated by user through the web and at the same time
by
monthly batch process.

Ilia

--- Shane Hathaway <shane@zope.com> wrote:
> On 31 Jul 2002, Sebastien Bigaret wrote:
> 
> >   ...I can't decide whether you are talking about
> initialization of a
> >   transaction _instance_. The last sentence
> suggests that participants
> >   are unregistered when the transaction closes: do
> you mean destroyed,
> >   or commit/rollback time? If this is the latter
> case, then I guess I
> >   have missed something, since I cannot find any
> references in the
> >   previous threads about participants being
> unregistered at that
> >   point. If this is the first case (hence, making
> it possible to
> >   generate a given set of DataManagers for each
> new transaction), then
> >   my proposal for DM-factories might be
> meaningful.
> 
> The terminology we're using is a little confusing,
> since an object that is
> truly a transaction should probably begin its life
> at the beginning of a
> transaction and, at commit or rollback time, should
> become permanently
> immutable.  It might even be stored in the database.
> 
> But the things we've been calling transactions play
> a role more like
> transaction "coordinators".  As coordinators, they
> might be reused for
> numerous non-overlapping transactions.  If they are
> reused, it makes
> sense to be able to register a permanent transaction
> participant with a
> specific coordinator.
> 
> I think there might a problem, though.  ZODB
> customarily uses one
> transaction coordinator per thread.  But ZODB
> connections are not really
> thread-specific; they may be reused in a different
> thread when they are
> opened or closed.  So if, for example, you
> registered a permanent
> transaction participant that cleared the cache of a
> specific ZODB
> connection, you wouldn't get the effect you wanted!
> :-)
> 
> That's why I suggested that if you want permanent
> participants, that
> perhaps you'd really want to register the
> transaction participant for all
> threads.  It requires you to consider thread safety,
> but I think you'd
> frequently have to consider that anyway.
> 
> Shane
> 
> 
> _______________________________________________
> Persistence-sig mailing list
> Persistence-sig@python.org
>
http://mail.python.org/mailman-21/listinfo/persistence-sig


__________________________________________________
Do You Yahoo!?
Yahoo! Health - Feel better, live better
http://health.yahoo.com

From niki@vintech.bg  Wed Jul 31 09:29:41 2002
From: niki@vintech.bg (Niki Spahiev)
Date: Wed, 31 Jul 2002 11:29:41 +0300
Subject: [Persistence-sig] A simple Observation API
References: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
	<3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <3D479FF5.5010808@vintech.bg>

Phillip J. Eby wrote:
>>def addDate(self, date):
>>   self.dates.append(date)  # self.dates is a simple list
>>   self.dates.sort()
>>   self._p_changed = 1
>>
>>Let's say self.dates.sort() raises some exception that leads to an aborted
>>transaction.  Objects are supposed to be reverted on transaction abort,
>>but that won't happen here!  The connection was never notified that there
>>were changes, so self.dates is now out of sync.  But if the application
>>sets _p_changed just *before* the change, aborting will work.
> 
> 
> Good point.  I hadn't really thought about that use case.  But the
> Observation API I proposed does support it, via separate
> beforeChange()/afterChange() notifications.  A DM could track
> beforeChange() to know that an object needs rolling back, and
> afterChange(), to actually send a change through to an underlying DB, if
> it's in write-through mode at the time.

Maybe this will solve it?

def addDate(self, date):
    self._p_changed = 1
    self.dates.append(date)  # self.dates is a simple list
    self.dates.sort()
    self._p_changed = 1

_p_changed before *and* after?

regards,
Niki Spahiev


From pje@telecommunity.com  Wed Jul 31 13:06:10 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 31 Jul 2002 08:06:10 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <Pine.LNX.4.33L2.0207302138340.13371-100000@shane.zope.com>
References: <87d6t4zt5e.fsf@inqual.com>
Message-ID: <5.1.0.14.0.20020731075426.050ed220@mail.telecommunity.com>

At 10:26 PM 7/30/02 -0400, Shane Hathaway wrote:

>But the things we've been calling transactions play a role more like
>transaction "coordinators".  As coordinators, they might be reused for
>numerous non-overlapping transactions.  If they are reused, it makes
>sense to be able to register a permanent transaction participant with a
>specific coordinator.
>
>I think there might a problem, though.  ZODB customarily uses one
>transaction coordinator per thread.  But ZODB connections are not really
>thread-specific; they may be reused in a different thread when they are
>opened or closed.  So if, for example, you registered a permanent
>transaction participant that cleared the cache of a specific ZODB
>connection, you wouldn't get the effect you wanted! :-)

Well, ZODB could always:

1. do as it does now, and register non-permananently, or
2. pool the transaction with the connection. (See below.)


>That's why I suggested that if you want permanent participants, that
>perhaps you'd really want to register the transaction participant for all
>threads.  It requires you to consider thread safety, but I think you'd
>frequently have to consider that anyway.

In my use case, the transaction will live as an attribute of a root 
"application" object, and application objects will be pooled for use by 
different threads.  Application objects also contain as attributes all 
their connections, data managers, etc.  So everything's pooled together, 
and there's no question of which transaction goes with what.

This approach is virtually identical to what Zope does now, except that 
Zope keeps the transaction with the thread, instead of with the resource pool.


From barry@python.org  Wed Jul 31 13:21:48 2002
From: barry@python.org (Barry A. Warsaw)
Date: Wed, 31 Jul 2002 08:21:48 -0400
Subject: [Persistence-sig] A simple Observation API
References: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
	<3.0.5.32.20020730150539.0089c240@telecommunity.com>
	<3D479FF5.5010808@vintech.bg>
Message-ID: <15687.54876.929262.735189@anthem.wooz.org>


>>>>> "NS" == Niki Spahiev <niki@vintech.bg> writes:

    | def addDate(self, date):
    |     self._p_changed = 1
    |     self.dates.append(date)  # self.dates is a simple list
    |     self.dates.sort()
    |     self._p_changed = 1

    NS> _p_changed before *and* after?

Seems unnecessarily redundant.  IIUC, setting _p_changed to 1 will
register the object so setting it twice simply registers it twice,
which doesn't seem very useful.  I guess when to set _p_changed will
be a decision that the object designer will have to make based on the
semantics of the object, and the operation.

-Barry

From pje@telecommunity.com  Wed Jul 31 14:20:52 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 31 Jul 2002 09:20:52 -0400
Subject: [Persistence-sig] When to set _p_changed (was A simple
  Observation API)
In-Reply-To: <15687.54876.929262.735189@anthem.wooz.org>
References: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
 <3.0.5.32.20020730150539.0089c240@telecommunity.com>
 <3D479FF5.5010808@vintech.bg>
Message-ID: <3.0.5.32.20020731092052.019c1210@telecommunity.com>

At 08:21 AM 7/31/02 -0400, Barry A. Warsaw wrote:
>
>I guess when to set _p_changed will
>be a decision that the object designer will have to make based on the
>semantics of the object, and the operation.
>

Actually, if Shane's proposal for how to handle cascaded data managers is
used, then it will be unequivocal that _p_changed *must* be set *before*
the change, in order to ensure proper rollback behavior.

A principal drawback to the approach that I had been proposing, was that it
required _p_changed to be set *after* a change, which wasn't good for being
able to ensure that rollbacks would always be handled correctly.


From shane@zope.com  Wed Jul 31 14:50:27 2002
From: shane@zope.com (Shane Hathaway)
Date: Wed, 31 Jul 2002 09:50:27 -0400 (EDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <5.1.0.14.0.20020731075426.050ed220@mail.telecommunity.com>
Message-ID: <Pine.LNX.4.33L2.0207310940030.14742-100000@shane.zope.com>

On Wed, 31 Jul 2002, Phillip J. Eby wrote:

> In my use case, the transaction will live as an attribute of a root
> "application" object, and application objects will be pooled for use by
> different threads.  Application objects also contain as attributes all
> their connections, data managers, etc.  So everything's pooled together,
> and there's no question of which transaction goes with what.
>
> This approach is virtually identical to what Zope does now, except that
> Zope keeps the transaction with the thread, instead of with the resource pool.

That's an interesting idea.  It should work well (though you'll have a
bootstrapping issue ;-) ).  It may not be necessary, though, for all
transaction coordinators to provide a method for registering a permanent
participant.  It could be a method of a different interface.  Not all
coordinators (i.e. non-pooled) will be able to fulfill the contract as
expected.

Shane


From jim@zope.com  Wed Jul 31 19:23:16 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 14:23:16 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <5.1.0.14.0.20020723140040.0519cc90@mail.telecommunity.com>
	<5.1.0.14.0.20020723150312.04f70390@mail.telecommunity.com>
Message-ID: <3D482B14.4020506@zope.com>

Phillip J. Eby wrote:
> At 11:35 AM 7/23/02 -0700, Ilia Iourovitski wrote:
> 
...


>> > The most straightforward way to handle queries is by
>> > creating query data
>> > managers, which take OIDs that represent the
>> > parameters of the query.
>> >
>> Most of the time people retrive object by attributes.
>> not by OID.
> 
> 
> Right.  So define a query manager that takes the attributes as fields in 
> an OID, and returns a persistent object that represents a sequence of 
> records.  e.g.
> 
> for object in someQueryMgr[ ('param1value','param2value') ]:
>     ...
> 
> All you need is a separate query manager for each (parameterized) query 
> your app needs -- and again, there's nothting stopping you from 
> generating your own via metadata or even from OQL if that's your heart's 
> desire.

I think queries are entirely different beasts from oids. I would
recommend be inclined to see a data-manager specific query interface
for queries.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From jim@zope.com  Wed Jul 31 19:35:57 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 14:35:57 -0400
Subject: [Persistence-sig] "Straw Baby" Persistence API
References: <20020723170836.41755.qmail@web20702.mail.yahoo.com>
Message-ID: <3D482E0D.6070303@zope.com>

Ilia Iourovitski wrote:
> --- Jim Fulton <jim@zope.com> wrote:
> 

...

>>>create(object) storage shall populated id from
>>>
>>rdbms
>>
>>>which is usually primary key.
>>>
>>This should not be necessary. One should be able to
>>design a data manager that detected new objects and
>>assigned them ids when referencing objects are
>>created.
>>
> 
> Typical storage (rdbms, odbms, xml like xindicea)
> do not provide root object.
 > So after transaction
 > started
 > object must be loaded from storage or created.

This is a good point. There often isn't
a single root objects that are objects are reachable
from. On the other hand, most non-trivial relationaql
systems have related objects. Most objects are reachable
from other objects. It should be possible to load objects
automatically when traversing to them from other objects.

In addition, if a new object is added to another object, it
should bve possible to add the new object to the database
automatically.

>>>delete(object) 
>>>
>>I can imagine a datamanager that lacked garbage
>>collection could
>>need this.
>>
>>
> in case of rdbms there are objects which are not
> referenced. 

Right.

>>>load(object type, object id)->object
>>>
>>An object type should be unnecessary. If a data
>>manager
>>needs to track this sort of information, it should
>>embed it in the object id.
>>
> 
> In rdbms case id usually integer. adding the whole
> package/class name can be expensive.

That depends on how you do it, the object id need not
be ythe same as the primary key and could encode the
class in a more efficient manner than storing the
package and class names.


Jim



-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From jim@zope.com  Wed Jul 31 20:38:13 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 15:38:13 -0400
Subject: [Persistence-sig] A simple Observation API
References: <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<873cu6a930.fsf@bidibule.brest.inqual.bzh>
Message-ID: <3D483CA5.1020704@zope.com>

Sebastien Bigaret wrote:

...

> About caching and caching policies: 
> 
>   Phillip did talk about 'transactional caching' and I'm not sure what it
>   really is, however, there is some needs to have 'application-wide' caching
>   mechanism to avoid unnecessary round-trips to the DB.

Right

 >   Of course, this should
>   not defeat the 'smallest-possible-memory-footprint-requirement' pointed out
>   in the sig charter ;

Where the goal is not "smallest possible" but small enough and smaller
than the entire database.

 >   but if an object has already been fetched somewhere
>   (and is still active in an other thread, or the cache/snapshots would have
>   been deleted), then it is usually unnecessary to re-fetch the object, simply
>   use the cached snapshot instead. But this sounds to me a bit off-topic for
>   this list.

One of the goals of ZODB's caching strategy is also to provide isolation between
separate threads. Different threads can have separate caches and so don't need
locks to mediate access to objects in the caches.

Concievably, an RDBMS-based data manager could employ a lower-level cache
to avoid duplicate RDBMS accesses among threads in much the way ZEO uses
a client cache to avoid extra trips to the storage server.


> 
> +1 on defining a state model for persistent objects ; however I'm a little
> fuzzy about the difference between 'unsaved' and 'changed'. To my
> understanding 'unsaved' is for new objects,

Right.

 > while 'changed' is for existing
> (previously made persistent objects, is this right?

Right.


...

> Ilia> create(object) storage shall populated id from rdbms
> Ilia> which is usually primary key.
> 
> Jim> This should not be necessary. One should be able to
> Jim> design a data manager that detected new objects and
> Jim> assigned them ids when referencing objects are created.
> 
> Can you elaborate on that?

Suppose I have a car object that has already been stored in
the database, but it doesn't have an engine.  I should be
able to say:

   # get the car ...

   car.engine = Engine()
   commit()

Now when I commit, the car's data manager should be able to notice that
it now has an engine and that the engine doesn't have an oid.
It will then know that an new engine object needs to be created and
that it's primary key (which is not necessarily the same as the oid)
needs to be stored in the cars engine column.)


Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From jeremy@alum.mit.edu  Wed Jul 31 20:37:56 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 15:37:56 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <Pine.LNX.4.33L2.0207292226540.10948-100000@shane.zope.com>
References: <15685.57251.14632.949497@slothrop.zope.com>
	<Pine.LNX.4.33L2.0207292226540.10948-100000@shane.zope.com>
Message-ID: <15688.15508.288534.906790@slothrop.zope.com>

>>>>> "SH" == Shane Hathaway <shane@zope.com> writes:

  >> The APIs look like this:
  >>
  >> class ITransaction(Interface): """Transaction objects."""
  >>
  >> def abort(): """Abort the current transaction."""
  >>
  >> def begin(): """Begin a transaction."""
  >>
  >> def commit(): """Commit a transaction."""
  >>
  >> def join(resource): """Join a resource manager to the current
  >> transaction."""

  SH> By "resource manager" do you mean "IDataManager"?

I have used these terms somewhat interchangeably, yes.  I think
"resource manager" is the more widely used terminology.

  >>
  >> def status(): """Return status of the current transaction."""

  SH> What kind of object would status() return?  Who might make use
  SH> of it?

I expect status returns values from an "enum" with values like
in-progress, committed, aborted.

  SH> Also, I'd like to see some way to set transaction metadata.

I didn't include any transaction metadata in the generic Transaction
interface.  I wasn't sure how generally applicable that was.  Instead,
I created a subclass of Transaction in ZODB that has the old ZODB
interface. 

  SH> I would like this interface to be called
  SH> ITransactionParticipant.  There are many interesting kinds of
  SH> objects that would be interested in participating in a
  SH> transaction, and not all of them have the immediate
  SH> responsibility of storing data.  But the names you chose for the
  SH> methods are very clear and concise, I think.

I think IResourceManager is probably better (see above).  I wish I
could take credit for the names, but I just grabbed them from the Gray
& Reuter book :-).

  >> class IRollback(Interface):
  >>
  >> def rollback(): """Rollback changes since savepoint."""
  >>
  >> I think the rollback mechanism will work well enough.  Gray and
  >> Reuter explain that it can be used to simulate a nested
  >> transaction architecture.  Thus, I think it's a reasonable
  >> building block for the nested transaction API.

  SH> Making rollback operations into objects is a little surprising,
  SH> but as I don't fully understand the ideas behind nested
  SH> transactions, I'm sure there's a reason for rollback objects to
  SH> exist. :-)

The database needs some object to represent the particular savepoint.
A transaction could call savepoint() three times and have three
different states it could rollback to.  I decided a rollback object
was clearer than a rollback() method on the transaction that took a
savepoint_id argument.

  SH> It seems to me that the data manager should register to receive
  SH> specific notifications.  Some data managers are only interested
  SH> in knowing when an object is moving from "ghost" to "saved" and
  SH> from "saved" to "changed" state (such as ZODB); others might
  SH> want more events, like being notified the first time an object
  SH> is read in a transaction or receiving notification of *every*
  SH> attribute change.  Supporting the extra events in C only incurs
  SH> a speed penalty if the data manager requests those events.

That's a good idea.  We need to flesh out all the events that might be
part of the persistence framework, then we can see how that percolates
up into the transaction API.

Jeremy




From jim@zope.com  Wed Jul 31 20:46:08 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 15:46:08 -0400
Subject: [Persistence-sig] A simple Observation API
References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
	<3.0.5.32.20020729173745.008a0240@telecommunity.com>
	<5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
	<5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
	<3.0.5.32.20020730135832.008fa690@telecommunity.com>
Message-ID: <3D483E80.6070906@zope.com>

Phillip J. Eby wrote:
> At 08:40 AM 7/30/02 -0400, Jeremy Hylton wrote:
> 
...


> This has to do with the "write-through mode" phase between
> "prepareToCommit()" and "voteOnCommit()" messages (whatever you call them).
>  During this phase, to support cascaded storage (one data manager writes to
> another), all data managers must "write through" any changes that occur
> *immediately*.  They can't wait for "prepareToCommit()", because they've
> already received it.  Basically, when the object says, "I've changed"
> (i.e. via "register" or "notify" or whatever you call it), the data manager
> must write it out right then.
> 
> But, if the _p_changed flag is set *before* the change, the data manager
> has no way to know what the change was and write it.  It can't wait for
> "voteOnCommit()", because then the DM it writes to might have already
> voted, for example.  It *must* know about the change as soon as the change
> has occurred.  Thus, the change message must *follow* a change.  It's okay
> if there are multiple change messages, as long as there's at least one
> *after* a set of changes.

I realize that this issue seems to be resolved by eliminating write throughs,
but I want to make sure I understand something here. I would have assumed that
all changes to persistent objects would occur before commit, and thus before
any prepares are done. You seem to be assuming that persistent objects could
change after the application has issued a commit. Is that right? Is the reason
that the prepare logic of some data managers do their work by manipulating
persistent objects in other data managers?

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From jim@zope.com  Wed Jul 31 20:54:46 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 15:54:46 -0400
Subject: [Persistence-sig] A simple Observation API
References: <Pine.LNX.4.33L2.0207302138340.13371-100000@shane.zope.com>
Message-ID: <3D484086.2020408@zope.com>

Shane Hathaway wrote:
> On 31 Jul 2002, Sebastien Bigaret wrote:
> 
> 
>>  ...I can't decide whether you are talking about initialization of a
>>  transaction _instance_. The last sentence suggests that participants
>>  are unregistered when the transaction closes: do you mean destroyed,
>>  or commit/rollback time? If this is the latter case, then I guess I
>>  have missed something, since I cannot find any references in the
>>  previous threads about participants being unregistered at that
>>  point. If this is the first case (hence, making it possible to
>>  generate a given set of DataManagers for each new transaction), then
>>  my proposal for DM-factories might be meaningful.
>>
> 
> The terminology we're using is a little confusing, since an object that is
> truly a transaction should probably begin its life at the beginning of a
> transaction and,

This doesn't help. ;)

 > at commit or rollback time, should become permanently
> immutable.  It might even be stored in the database.
 >
> But the things we've been calling transactions play a role more like
> transaction "coordinators".  As coordinators, they might be reused for
> numerous non-overlapping transactions.  If they are reused, it makes
> sense to be able to register a permanent transaction participant with a
> specific coordinator.

Right. We really need to clean up the terminology. We should distinguish
between "transaction coordinators" (or "transaction managers" or whatever)
and transactions.

> I think there might a problem, though.  ZODB customarily uses one
> transaction coordinator per thread. 

This is changing. In the future, you'll be able to asociate a connection
and a transaction coordinator independent of thread.

 > But ZODB connections are not really
> thread-specific; they may be reused in a different thread when they are
> opened or closed.  So if, for example, you registered a permanent
> transaction participant that cleared the cache of a specific ZODB
> connection, you wouldn't get the effect you wanted! :-)

But ZODB connections are rarely used by multiple threads at the same time.
In the model where you do associate transaction coordinators with threads,
what you'd want to do is register a connection with a thread-global
transaction coordinator when the connection is opened and unregister it
when it is closed.


> That's why I suggested that if you want permanent participants, that
> perhaps you'd really want to register the transaction participant for all
> threads. 

No, you don't want that.

 > It requires you to consider thread safety, but I think you'd
> frequently have to consider that anyway.

No, with ZODB you effectively never have to worry about threads.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From jim@zope.com  Wed Jul 31 20:56:27 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 15:56:27 -0400
Subject: [Persistence-sig] A simple Observation API
References: <20020731063240.55579.qmail@web20705.mail.yahoo.com>
Message-ID: <3D4840EB.20501@zope.com>

Ilia Iourovitski wrote:
> If object participate in more than one transaction
> concurrently, transaction API shall provide locks.

Right. That's why you really don't want to share a single
(copy of an) object among multiple concurrent threads or transactions.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From pje@telecommunity.com  Wed Jul 31 20:55:40 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 31 Jul 2002 15:55:40 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3D483E80.6070906@zope.com>
References: <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
 <3.0.5.32.20020729173745.008a0240@telecommunity.com>
 <5.1.0.14.0.20020723184912.050abec0@mail.telecommunity.com>
 <5.1.0.14.0.20020730082232.04cd62b0@mail.telecommunity.com>
 <3.0.5.32.20020730135832.008fa690@telecommunity.com>
Message-ID: <3.0.5.32.20020731155540.00910250@telecommunity.com>

At 03:46 PM 7/31/02 -0400, Jim Fulton wrote:
>
>I realize that this issue seems to be resolved by eliminating write throughs,
>but I want to make sure I understand something here. I would have assumed
that
>all changes to persistent objects would occur before commit, and thus before
>any prepares are done. You seem to be assuming that persistent objects could
>change after the application has issued a commit. Is that right? Is the
reason
>that the prepare logic of some data managers do their work by manipulating
>persistent objects in other data managers?

Right.  One of my examples was data manager "A" saving its objects by
writing them to an XML DOM.  That DOM in turn might be a set of persistent
objects, managed by data manager "B", which saves them by writing the
entire XML document into a field of a relational database.


From shane@zope.com  Wed Jul 31 21:01:43 2002
From: shane@zope.com (Shane Hathaway)
Date: Wed, 31 Jul 2002 16:01:43 -0400 (EDT)
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3D484086.2020408@zope.com>
Message-ID: <Pine.LNX.4.33L2.0207311558230.15641-100000@shane.zope.com>

On Wed, 31 Jul 2002, Jim Fulton wrote:

> Shane Hathaway wrote:
>  > But ZODB connections are not really
> > thread-specific; they may be reused in a different thread when they are
> > opened or closed.  So if, for example, you registered a permanent
> > transaction participant that cleared the cache of a specific ZODB
> > connection, you wouldn't get the effect you wanted! :-)
>
> But ZODB connections are rarely used by multiple threads at the same time.
> In the model where you do associate transaction coordinators with threads,
> what you'd want to do is register a connection with a thread-global
> transaction coordinator when the connection is opened and unregister it
> when it is closed.

Ah-ha, that would work.  Thanks.

Shane


From jeremy@alum.mit.edu  Wed Jul 31 21:09:10 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 16:09:10 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <Pine.LNX.4.33L2.0207311546190.15641-100000@shane.zope.com>
References: <15688.15508.288534.906790@slothrop.zope.com>
	<Pine.LNX.4.33L2.0207311546190.15641-100000@shane.zope.com>
Message-ID: <15688.17382.566113.907598@slothrop.zope.com>

>>>>> "SH" == Shane Hathaway <shane@zope.com> writes:

  SH> On Wed, 31 Jul 2002, Jeremy Hylton wrote: I would like this
  SH> interface to be called ITransactionParticipant.  There are many
  SH> interesting kinds of objects that would be interested in
  SH> participating in a transaction, and not all of them have the
  SH> immediate responsibility of storing data.  But the names you
  SH> chose for the methods are very clear and concise, I think.
  >>
  >> I think IResourceManager is probably better (see above).  I wish
  >> I could take credit for the names, but I just grabbed them from
  >> the Gray & Reuter book :-).

  SH> Ok, but some of the things we'd like to tie into transactions
  SH> don't really manage data/resources.  For example,
  SH> "CommitVersion", "AbortVersion", and "TransactionalUndo" objects
  SH> (from ZODB 3) just listen for the "commit" message.  Then they
  SH> ask an object that really is responsible for data/resources to
  SH> do something.

  SH> I don't have the book, but my uneducated guess is that we're
  SH> working with something a little more general than what Gray and
  SH> Reuter proposed.

I think that "resource manager" is a suitably generic term.  Do we
really care whether the thing-with-a-commit-method manages an object
or not?  I don't think it makes things clearer to distinguish between
the overall class of resource managers and the subset that manage
their own objects.

There are a bunch of ways to split this hair:

The XXXVersion and TransactionalUndo objects really do have resources
-- the names of the version or the transaction id.

The Connection doesn't manage objects either, the storage does.  So
the storage is a resource manager (except that it doesn't support the
resource manager API) and all these things layered on top constitute
nested resource managers.

All of the above are resource managers.  It's not appropriate to ask
how these managers work, because that's not part of the transaction
API.  A resource manager is just a black box with prepare(), commit(),
etc. methods.

Jeremy




From pje@telecommunity.com  Wed Jul 31 21:07:55 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 31 Jul 2002 16:07:55 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <15688.16504.339771.416052@slothrop.zope.com>
References: <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com>
 <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>
 <87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
 <5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
 <3.0.5.32.20020719120237.00898b60@telecommunity.com>
 <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com>
Message-ID: <3.0.5.32.20020731160755.01389ca0@telecommunity.com>

At 03:54 PM 7/31/02 -0400, Jeremy Hylton wrote:
>>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
>
>  PJE> At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote:
>  >> Last week, I worked out a revised transaction API for user code
>  >> and for data managers.  It's implemented in ZODB4, but is fairly
>  >> preliminary code.  I imagine we'll revise it further, but I'd
>  >> like to describe the changes briefly.
>
>  PJE> I'm not sure if this new API is in relation to the proposals on
>  PJE> this list or not, but I'm curious how this affects a few
>  PJE> things:
>
>  PJE> * The need for participants to join every transaction.  
>
>How would you like this feature to interact with custom policies for
>mapping threads to transaction ids?  If ZODB keeps with its default
>policy, it may be useful for a ZODB Connection (resource manager) to
>join every transaction run by a particular thread.  However, the
>Connection would need to stop joining at some pount.

I'd just like to be able to create a transaction manager that's used for
some set of transactions, and register some set of participants to it.
Nothing fancy, really.  I'll manage my own threading issues entirely
outside of the transaction manager and participants.


>  PJE> * If a data manager can't support rollback to a savepoint, what
>  PJE>   does it return?
>
>Good question.  Here's my first guess at an answer: It returns None.
>If multiple resource managers participate in a transaction and one
>doesn't support savepoints, then the application can't rollback the
>savepoint.  The other resource managers may execute the savepoint, but
>rollback is impossible.

Perhaps it should return a NullSavePoint object, that returns False to a
"can_rollback" method.  The aggregated savepoint object would return true
for can_rollback if all its contents return true, and the rollback() method
would only run if can_rollback is true.


>(In the case of ZODB, it can be useful to execute a savepoint
>regardless of whether it is rollback, because it allows modified
>objects to become ghosts.)

Yes; it could also be used to ensure that an external system such as an SQL
database reflects the current state of persistent objects, which is handy
if one must also use legacy code in the same transaction context which runs
against that data.


>  >> (The need for notify-on-read, BTW, is to support higher isolation
>  >> levels than ZODB currently supports.)
>
>  PJE> And to support delayed loading of attributes by multi-backend
>  PJE> data managers.  Although to support that, there'd need to be
>  PJE> the opportunity to override the attribute value that was read.
>
>It's possible to define a custom __getattr__ on a Persistent
>subclass.  Is that enough?

Nope.  __getattr__ is implemented by the Persistent object, not the data
manager.  I want the *data manager* to do delayed loading of attributes in
certain cases.  For example, it's a common use case for me to need data for
an object from both an LDAP and an SQL database.  However, some LDAP
attributes (such as a user's picture or the membership of an LDAP group)
are *huge*.  I'd like to avoid the overhead of loading these attributes
until/unless they're needed (because on most transactions they're not
needed).  That's why I need the ability for a data manager to implement
delayed loading of certain attributes.

Without this separation, you can't implement your Persistent objects as a
truly abstract application model, that's portable to different data
managers as backends.  A Persistent object should never have to know
details of how its storage is implemented.  In theory, as a result of this
SIG's work, people should be able to write a set of Persistent classes for
any application model, and then persist it with any sufficiently capable
data manager(s).


From shane@zope.com  Wed Jul 31 20:56:36 2002
From: shane@zope.com (Shane Hathaway)
Date: Wed, 31 Jul 2002 15:56:36 -0400 (EDT)
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <15688.15508.288534.906790@slothrop.zope.com>
Message-ID: <Pine.LNX.4.33L2.0207311546190.15641-100000@shane.zope.com>

On Wed, 31 Jul 2002, Jeremy Hylton wrote:

>   SH> I would like this interface to be called
>   SH> ITransactionParticipant.  There are many interesting kinds of
>   SH> objects that would be interested in participating in a
>   SH> transaction, and not all of them have the immediate
>   SH> responsibility of storing data.  But the names you chose for the
>   SH> methods are very clear and concise, I think.
>
> I think IResourceManager is probably better (see above).  I wish I
> could take credit for the names, but I just grabbed them from the Gray
> & Reuter book :-).

Ok, but some of the things we'd like to tie into transactions don't really
manage data/resources.  For example, "CommitVersion", "AbortVersion", and
"TransactionalUndo" objects (from ZODB 3) just listen for the "commit"
message.  Then they ask an object that really is responsible for
data/resources to do something.

I don't have the book, but my uneducated guess is that we're working with
something a little more general than what Gray and Reuter proposed.

Shane


From jeremy@alum.mit.edu  Wed Jul 31 20:54:32 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 15:54:32 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com>
References: <200207191609.g6JG91A26544@pcp02138704pcs.reston01.va.comcast.net>
	<87y9cdw37b.fsf@bidibule.brest.inqual.bzh>
	<5.1.0.14.0.20020714115819.05bc9d50@mail.telecommunity.com>
	<3.0.5.32.20020719120237.00898b60@telecommunity.com>
	<5.1.0.14.0.20020730082812.05e848c0@mail.telecommunity.com>
Message-ID: <15688.16504.339771.416052@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  PJE> At 08:36 PM 7/29/02 -0400, Jeremy Hylton wrote:
  >> Last week, I worked out a revised transaction API for user code
  >> and for data managers.  It's implemented in ZODB4, but is fairly
  >> preliminary code.  I imagine we'll revise it further, but I'd
  >> like to describe the changes briefly.

  PJE> I'm not sure if this new API is in relation to the proposals on
  PJE> this list or not, but I'm curious how this affects a few
  PJE> things:

  PJE> * The need for participants to join every transaction.  

How would you like this feature to interact with custom policies for
mapping threads to transaction ids?  If ZODB keeps with its default
policy, it may be useful for a ZODB Connection (resource manager) to
join every transaction run by a particular thread.  However, the
Connection would need to stop joining at some pount.

  PJE> * Arbitarily nested, cascading participants.  Does this support
  PJE> them?  How?  I don't see any mention of the issues in the
  PJE> interfaces.

It doesn't support them at all, as much previous discussion has
illustrated.  I'd like to address that in a separate email.

  PJE> * If a data manager can't support rollback to a savepoint, what
  PJE>   does it return?

Good question.  Here's my first guess at an answer: It returns None.
If multiple resource managers participate in a transaction and one
doesn't support savepoints, then the application can't rollback the
savepoint.  The other resource managers may execute the savepoint, but
rollback is impossible.

(In the case of ZODB, it can be useful to execute a savepoint
regardless of whether it is rollback, because it allows modified
objects to become ghosts.)

I'm not sure if it's useful to provide an introspection capability to
see if rollback is allowed.

  >> (The need for notify-on-read, BTW, is to support higher isolation
  >> levels than ZODB currently supports.)

  PJE> And to support delayed loading of attributes by multi-backend
  PJE> data managers.  Although to support that, there'd need to be
  PJE> the opportunity to override the attribute value that was read.

It's possible to define a custom __getattr__ on a Persistent
subclass.  Is that enough?

Jeremy



From jeremy@alum.mit.edu  Wed Jul 31 21:23:42 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 16:23:42 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
References: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
	<3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <15688.18254.298450.338865@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  PJE> That's what TransactionAgents does, but that's not what I'm
  PJE> looking for per se.  I'm looking at simple data managers.  For
  PJE> example, if I make a data manager that persists a set of
  PJE> objects to an XML DOM, I might want to use it with a DOM
  PJE> persistence manager that stores XML documents in an SQL
  PJE> database.  All three "data managers" (persist->XML,
  PJE> XML->Database, SQL database) are transaction participants, with
  PJE> implied or actual ordering.

If I understand this example correctly, then there are three different
objects that implement the resource manager interface:

1. persist->XML
2. XML->Database
3. Database

It sounds like the application code only interacts with 1, and that 2
and 2 should be considered implementation details of 1.  Thus, only 1
should register with the transaction, since it's the only independent
entity.

When the transaction commits, it first calls prepare() on 1.  This
delegates the responsibility for the commit to 2, which in turn
delegates to 3.  So for 1 to return True from its prepare, 2 and 3
must also return True.

Why doesn't this work? :-)

Jeremy




From jeremy@alum.mit.edu  Wed Jul 31 21:26:49 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 16:26:49 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
References: <3.0.5.32.20020730135832.008fa690@telecommunity.com>
	<3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <15688.18441.175230.815465@slothrop.zope.com>

> DM2.prepare()
> DM3.prepare()
> DM1.prepare()
> 
> DM2.vote()
> DM3.vote()
> DM1.vote()

Note in the API I've proposed/implemented, there is only prepare(),
not vote().  The resource manager should return True from prepare() if
it is prepared to commit.

Jeremy


From shane@zope.com  Wed Jul 31 21:33:02 2002
From: shane@zope.com (Shane Hathaway)
Date: Wed, 31 Jul 2002 16:33:02 -0400 (EDT)
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <15688.17382.566113.907598@slothrop.zope.com>
Message-ID: <Pine.LNX.4.33L2.0207311616400.15641-100000@shane.zope.com>

On Wed, 31 Jul 2002, Jeremy Hylton wrote:

> >>>>> "SH" == Shane Hathaway <shane@zope.com> writes:
>   SH> I don't have the book, but my uneducated guess is that we're
>   SH> working with something a little more general than what Gray and
>   SH> Reuter proposed.
>
> I think that "resource manager" is a suitably generic term.  Do we
> really care whether the thing-with-a-commit-method manages an object
> or not?  I don't think it makes things clearer to distinguish between
> the overall class of resource managers and the subset that manage
> their own objects.

I'm going to defer to your book, with a final objection that "resource
manager" is terribly non-descriptive except in the context of a special
jargon.  If I'm a Python programmer with plenty of experience but no
experience in transactions, I'm going to have to read a whole book to
learn what a resource manager is.  Something like "transaction
participant", however, gives me a much better idea of the contract
between the coordinator and the participant.

Shane


From jim@zope.com  Wed Jul 31 21:33:30 2002
From: jim@zope.com (Jim Fulton)
Date: Wed, 31 Jul 2002 16:33:30 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
References: <Pine.LNX.4.33L2.0207311546190.15641-100000@shane.zope.com>
Message-ID: <3D48499A.6060507@zope.com>

Shane Hathaway wrote:
> On Wed, 31 Jul 2002, Jeremy Hylton wrote:
> 
> 
>>  SH> I would like this interface to be called
>>  SH> ITransactionParticipant.  There are many interesting kinds of
>>  SH> objects that would be interested in participating in a
>>  SH> transaction, and not all of them have the immediate
>>  SH> responsibility of storing data.  But the names you chose for the
>>  SH> methods are very clear and concise, I think.
>>
>>I think IResourceManager is probably better (see above).  I wish I
>>could take credit for the names, but I just grabbed them from the Gray
>>& Reuter book :-).
>>
> 
> Ok, but some of the things we'd like to tie into transactions don't really
> manage data/resources. 

I agree, theoritically, but


 > For example, "CommitVersion", "AbortVersion", and
> "TransactionalUndo" objects (from ZODB 3) just listen for the "commit"
> message. 

These are not good examples, as these would no longer be registered with the
transaction coordinator. Rather, they would be handled internally to the
resource manager (ow whatever). They are certainly also about managing data,

Jim


-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org


From jeremy@alum.mit.edu  Wed Jul 31 21:45:06 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 16:45:06 -0400
Subject: [Persistence-sig] "Straw Man" transaction API
In-Reply-To: <Pine.LNX.4.33L2.0207311616400.15641-100000@shane.zope.com>
References: <15688.17382.566113.907598@slothrop.zope.com>
	<Pine.LNX.4.33L2.0207311616400.15641-100000@shane.zope.com>
Message-ID: <15688.19538.482052.762174@slothrop.zope.com>

>>>>> "SH" == Shane Hathaway <shane@zope.com> writes:

  SH> I'm going to defer to your book, with a final objection that
  SH> "resource manager" is terribly non-descriptive except in the
  SH> context of a special jargon.

I think special jargon is appropriate for the SIG.  We'll have to have
a section of the transaction PEP that defines the terms.

  SH> If I'm a Python programmer with plenty of experience but no
  SH> experience in transactions, I'm going to have to read a whole
  SH> book to learn what a resource manager is.  Something like
  SH> "transaction participant", however, gives me a much better idea
  SH> of the contract between the coordinator and the participant.

On the other hand, I think it's fair for end-user documentation to use
less precise terminology if that makes it easier to understand.  On
the third hand, this definition is only important for people writing
transaction participants.  It doesn't seem unreasonable for them to
learn the domain jargon; even better, if we stick with widely used
jargon, people with database backend experience but no Python
experience will be at home.

Jeremy




From pje@telecommunity.com  Wed Jul 31 22:41:31 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 31 Jul 2002 17:41:31 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <15688.18254.298450.338865@slothrop.zope.com>
References: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
 <3.0.5.32.20020730135832.008fa690@telecommunity.com>
 <3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <3.0.5.32.20020731174131.00904c10@telecommunity.com>

At 04:23 PM 7/31/02 -0400, Jeremy Hylton wrote:
>>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:
>
>  PJE> That's what TransactionAgents does, but that's not what I'm
>  PJE> looking for per se.  I'm looking at simple data managers.  For
>  PJE> example, if I make a data manager that persists a set of
>  PJE> objects to an XML DOM, I might want to use it with a DOM
>  PJE> persistence manager that stores XML documents in an SQL
>  PJE> database.  All three "data managers" (persist->XML,
>  PJE> XML->Database, SQL database) are transaction participants, with
>  PJE> implied or actual ordering.
>
>If I understand this example correctly, then there are three different
>objects that implement the resource manager interface:
>
>1. persist->XML
>2. XML->Database
>3. Database
>
>It sounds like the application code only interacts with 1, and that 2
>and 2 should be considered implementation details of 1.  Thus, only 1
>should register with the transaction, since it's the only independent
>entity.
>
>When the transaction commits, it first calls prepare() on 1.  This
>delegates the responsibility for the commit to 2, which in turn
>delegates to 3.  So for 1 to return True from its prepare, 2 and 3
>must also return True.
>
>Why doesn't this work? :-)
>

Because 3 would be shared by other objects also being persisted to that SQL
database, for just the first thing that comes to mind.

But that's an implementation detail.  This is primarily an architectural
issue.  Data manager 1 is generic code written to work on an XML DOM.  It
shouldn't have to *know* that the DOM *is* persistent, let alone *how* it's
persisted.  You're calling for the placement of global architecture
knowledge into individual components, that should only be known at a higher
abstraction level.


From pje@telecommunity.com  Wed Jul 31 22:42:48 2002
From: pje@telecommunity.com (Phillip J. Eby)
Date: Wed, 31 Jul 2002 17:42:48 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <15688.18441.175230.815465@slothrop.zope.com>
References: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
 <3.0.5.32.20020730135832.008fa690@telecommunity.com>
 <3.0.5.32.20020730150539.0089c240@telecommunity.com>
Message-ID: <3.0.5.32.20020731174248.009022d0@telecommunity.com>

At 04:26 PM 7/31/02 -0400, Jeremy Hylton wrote:
>> DM2.prepare()
>> DM3.prepare()
>> DM1.prepare()
>> 
>> DM2.vote()
>> DM3.vote()
>> DM1.vote()
>
>Note in the API I've proposed/implemented, there is only prepare(),
>not vote().  The resource manager should return True from prepare() if
>it is prepared to commit.
>

Note that this doesn't work correctly when resource managers are cascaded
and need re-flush messages, per the discussion between Shane and I.  :)


From jeremy@alum.mit.edu  Wed Jul 31 23:19:50 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 18:19:50 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020731174131.00904c10@telecommunity.com>
References: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
	<3.0.5.32.20020730135832.008fa690@telecommunity.com>
	<3.0.5.32.20020731174131.00904c10@telecommunity.com>
Message-ID: <15688.25222.339977.30416@slothrop.zope.com>

[Meta-comment: I'm sorry it's taking us so long to reach some kind of
understanding on this issue.  It seems like we keep talking past each
other, but I'm not sure why.]

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  [I wrote:]
  >> If I understand this example correctly, then there are three
  >> different objects that implement the resource manager interface:
  >>
  >> 1. persist->XML
  >> 2. XML->Database
  >> 3. Database
  >>
  >> It sounds like the application code only interacts with 1, and
  >> that 2 and 2 should be considered implementation details of 1.
  >> Thus, only 1 should register with the transaction, since it's the
  >> only independent entity.
  >>
  >> When the transaction commits, it first calls prepare() on 1.
  >> This delegates the responsibility for the commit to 2, which in
  >> turn delegates to 3.  So for 1 to return True from its prepare, 2
  >> and 3 must also return True.
  >>
  >> Why doesn't this work? :-)
  >>

  PJE> Because 3 would be shared by other objects also being persisted
  PJE> to that SQL database, for just the first thing that comes to
  PJE> mind.

If you call prepare() twice on a resource manager, it should return
the same answer both times, right?  If so, then it shouldn't matter if
the same resource manager is being used as a top-level component and
an internal component.  It will perform its prepare work the first
time it is called and then just return its vote the second time it is
called.

  PJE> But that's an implementation detail.  This is primarily an
  PJE> architectural issue.

I agree that it's an architectural issue.  (It's good that we agree on
some things <wink>.)  The example above sounds like a component-based
system, where there is a compound persist->xml->database component.
The subcomponents of this entity should not be registering themselves
with the transaction manager.  A component should control all
communication of its constituent parts with other components.

  PJE> Data manager 1 is generic code written to work on an XML DOM.
  PJE> It shouldn't have to *know* that the DOM *is* persistent, let
  PJE> alone *how* it's persisted.

The description of the first component implies that is supports
persistence objects and stores them using another component that
stores XML.  That top-level component *must* know how to handle
persistent objects and transactions, as it implements those
interfaces.

  PJE> You're calling for the placement of global architecture
  PJE> knowledge into individual components, that should only be known
  PJE> at a higher abstraction level.

I thought I was arguing the opposite.  Individual components should
not all talk to the global transaction manager.  Instead, when a
component is assembled, the parts should be wired together so that
each knows who to communicate with.

Jeremy


From jeremy@alum.mit.edu  Wed Jul 31 23:08:41 2002
From: jeremy@alum.mit.edu (Jeremy Hylton)
Date: Wed, 31 Jul 2002 18:08:41 -0400
Subject: [Persistence-sig] A simple Observation API
In-Reply-To: <3.0.5.32.20020731174248.009022d0@telecommunity.com>
References: <3.0.5.32.20020730150539.0089c240@telecommunity.com>
	<3.0.5.32.20020730135832.008fa690@telecommunity.com>
	<3.0.5.32.20020731174248.009022d0@telecommunity.com>
Message-ID: <15688.24553.963964.576250@slothrop.zope.com>

>>>>> "PJE" == Phillip J Eby <pje@telecommunity.com> writes:

  >> Note in the API I've proposed/implemented, there is only
  >> prepare(), not vote().  The resource manager should return True
  >> from prepare() if it is prepared to commit.
  >>

  PJE> Note that this doesn't work correctly when resource managers
  PJE> are cascaded and need re-flush messages, per the discussion
  PJE> between Shane and I.  :)

I didn't understand that discussion. :)

Jeremy