[Persistence-sig] "Straw Baby" Persistence API

Jim Fulton jim@zope.com
Mon, 22 Jul 2002 15:47:42 -0400


Phillip J. Eby wrote:
> At 01:16 PM 7/22/02 -0400, Jim Fulton wrote:
> 
>> Phillip J. Eby wrote:
>>
>>> * Remove the BTrees subpackage, and the Class, Cache, Function, and 
>>> Module
>>> modules, along with the ICache interface.  Rationale: The BTrees 
>>> package is
>>> only useful for a relatively small subset of possible persistence 
>>> backends,
>>> and is subject to periodic data structure changes which affect 
>>> applications
>>> using it.
>>
>>
>> I'm OK with taking out BTrees, however, BTrees were included in ZODB by
>> very popular demand.
> 
> 
> And they should continue to be included with ZODB. 

They don't depend on ZODB in any way.

 > But IMHO their use
> is specific to persistence mechanisms which use "pickle jar"-style or 
> "shelve"-like primitive databases.  (Primitive in the sense of not 
> providing any concepts such as indexes or built-in search 
> capabilities.)  If you have a higher-level mechanism, even one as simple 
> as SleepyCat DB (aka Berkeley DB) b-trees, you're most often better off 
> using those features of the backend.

I don't agree.


> If this were not true, there'd be no need for any persistence mechanisms 
> besides ZODB, and we wouldn't be having this conversation.  :)

There are lots of other reasons for a non-ZODB persistent storage
including:

1) Need to store data in relational databases

    - Because they are trusted

    - because data needs to be accessed from other apps

    - because they may scale better for some apps

2) Competition is good. :)

> (Note that I'm assuming that ZODB itself will continue to exist as an 
> independent package, providing a persistence mechanism through its 
> Connection, Database, and Storage objects.  It just shouldn't need to 
> include Persistence or Transaction any more;

Of course.

 > BTrees would become
> ZODB.BTrees, or something similar.)

No, they would be separate. They don't depend on ZODB.


> 
>> You haven't given a rational for not including the caching framework.
>> The caching framework is closely ties to persistence and, I think,
>> largely independent of data managers.
> 
> 
> IMHO the existing caches are tied to a specific caching policy, which 
> embeds many ZODB-ish assumptions.  For RDBMS work, I primarily need 
> transactional caching, where caches are cleared between transactions.  
> For that, I can use a simple WeakValueDictionary, with some code that 
> deactivates objects between transactions.
> 
> But if you think we should throw in some basic cache implementations for 
> the most common caching policies, I've no objection.  I just thought it 
> better to save argument at the present time over *which* caching 
> policies would be most common.  :)

I think that there should, at least, be a standard cache interface.
It should be possible to develop data managers and caches independently.
Maybe we could include one or two standard implementations. These could
provide useful examples for other implementations and, of course, be
useful in themselves.


...

>>> * Take out the interfaces.  :(  I'd rather this were, "leave this in, 
>>> in a
>>> way such that it works whether you have Interface or not", but the 
>>> reality
>>> is that a dependency in the standard library on something outside the
>>> standard library is a big no-no, and just begging for breakage as 
>>> soon as
>>> there *is* an Interface package (with a new API) in the standard 
>>> library.
>>
>>
>> I think that this is a very bad idea. I think the interfaces clarify 
>> things
>> quite a bit.
> 
> 
> I think maybe I was unclear.  I certainly don't think that the 
> interfaces should cease to exist, or that they should not exist as 
> documentation.  I'm referring to their inclusion as operating code, only.

So you don't want them to get imported?


...

>> These are not a big deal to you, because you have a deep understanding 
>> and
>> interest in the machinery. They are a big deal to most people. It would
>> be *wonderful* if we could avoid this. Maybe if we had a standard 
>> persistence
>> framework, we could motivate language changes that made this cleaner. :)
> 
> 
> Interesting that you say this, considering how much adoption ZODB has 
> had in the larger Python community.  Perhaps you could be more specific 
> as to the audience you're talking about?

I was mainly refering to the handling of non-persistent mutable
stumbling block. This is a major stubling block and source of errors
to most ZODB users.

Having to mix in persistence is an annoyance. It would be really
cool (but hard, very hard) to get rid of them.


> To get rid of these things is possible, but complex.  Getting rid of 
> Persistent while minimizing loss of generality would mean either 
> introducing proxies, or dynamically altering object types in order to 
> get the observation capability.  I'm seriously unconvinced that adding a 
> line to import Persistent, and adding a word to the definition of a few 
> application base classes, is so burdensome as to be worth the complexity 
> and fragility of either of the basic approaches to avoiding it!  (The 
> second issue could probably be addressed with an extension of the 
> solution to the first...  by adding further complexity.)

I agree that this is hard. It's really hard. I wasn't even suggesting
that we needed to solve this problem. I was merely pointing out that this
*is* a big deal for a lot of people.


> If our goal is to provide a Python core package for this in a speedy 
> timeframe -- say this summer -- I think that developing and debugging a 
> whole new way of doing things like this is probably out of the question.

Agreed. OTOH, it wouldn't hurt to ponder other alternatives, if not now,
them maybe later.

> Thing is, *we don't have to actually solve this problem*.  If we create 
> a decent base API/implementation, there's no reason people can't create 
> the proxies or class-substitution mechanisms on their own, using the 
> base implementation to do the actual persistence part.  In principle, it 
> should be possible to create such a mechanism for arbitrary data managers.

True. But maybe someone will think of a way to solve this without proxies
or alchemy?


...


>>     o When the object it accessed, it should notify it's data manager. 
>> Perhaps it
>>       should pass it's current state.
> 
> 
> I'd like to rephrase that as being it notifies, *if* it has been 
> requested to do so by the data manager.  The data manager may decide to 
> turn on or off such notifications at will.  (In other words, I want my 
> post-getattr hook function that can modify the result of the getattr, 
> and I want it removable so I don't continue to pay in performance once 
> all my state is loaded.)

We need to think some more about this. I'd rather err on the side of
simple persistent objects and complex data managers.

I'd also like persistent objects to be as lightweight as possible.
Carrying a bunch of attributes for hooks is worrysome/


> 
>>     o The persistent object calls a method on the data manager when 
>> it's state
>>       needs to be loaded.
> 
> 
> As long as I still have the ability to set or remove a getattr-hook that 
> works independently of this, I'm fine.

Would different objects in the same DM have different values of the same hook?
If so, why?


>>     o The persistent object should probably notify the data manager of 
>> any state
>>       changes.
> 
> 
> *Shrug*.  IAGNI.  (I ain't gonna need it. :)  I don't have a use case 
> for any messages but "I'm changed", "load me", and "postprocess a getattr".
> 
> For what it's worth, I'd *really* like to keep this *simple*.  Simple to 
> me means released sooner, more explicit, more reliable.  So I'd be 
> happiest if we can stick to specific use cases.

A decent cache is going to handle objects differenty based on their states.
For example, a cache that deactivates objects when they haven't been used in a
while needs to know which objects are ghostifyable and needs to know when
ghostifyable objects have changed.

> I've spent a lot of time hacking around the existing packages to do 
> SQL/LDAP stuff, and others here should have strong experience using ZODB 
> for its "natural" backends and application structures.  That means we 
> should be able to get pretty concrete about what is and isn't needed.  
> In the absence of more use cases, I'm not sure what else is really 
> needed besides what we've already discussed.  Indeed, most of what I've 
> outlined has been stuff I think should be taken *out*.
> 
> To put it another way, I think we should have to justify everything we 
> want to put *in*, not what we take out.  Python standard library modules 
> are widely distributed, and have a long life.  Whatever we put in needs 
> to have a healthy life expectancy!

I don't think we should approach this effort with the assumption that the first
version is going into the standard library. I'm pretty happy with the persistence
mechanism I came up with for ZODB, but there are a lot of things I'd like to fix.

I agree that we should be rather conservative, but this is a good time to fix things.
Having dome so, we should get some experience with what we've come up with before
we worry about adding it to the standard library.

Jim

-- 
Jim Fulton           mailto:jim@zope.com       Python Powered!
CTO                  (888) 344-4332            http://www.python.org
Zope Corporation     http://www.zope.com       http://www.zope.org