Object Database (ODBMS) for Python

Paul D. Fernhout pdfernhout at kurtz-fernhout.com
Fri Aug 29 12:11:51 EDT 2003


Patrick K. O'Brien wrote:
 > Jeremy Bowers <jerf at jerf.org> writes:
>>So why *isn't* it a transaction? Unless you have a good reason not
>>to, I'd suggest automatically "coercing" that into a transaction
>>instead of throwing an error.
> 
> These are some of my reasons: 1) every transaction gets pickled and
> logged before executed, so that the database can recover from a crash,
> 2) most of the other cool features depend on mutations passing through
> the extent manager for each class, 3) transparent transactions only
> seem like a good idea, 4) security is hard to enforce without explicit
> boundaries (read the Twisted docs regarding Perspective Broker), 5)
> explicit is better than implicit, especially when valuable persistent
> data is involved.
 >
> [snip]
>
>>(I'm only really entering my maturity (IMHO) as a software engineer,
>>but one of my rules of thumb for developing software for other
>>people to use is that the API can ***never*** be too easy. 
 >
> If you can figure out a way to have transparent transactions, without
> giving up on any ACID properties considered mandatory for a DBMS, I
> would love to hear about it.  Have you worked with any other object
> databases?

To cite:
   http://databases.about.com/library/weekly/aa120102a.htm
"The ACID model is one of the oldest and most important concepts of 
database theory.  It sets forward four goals that every database 
management system must strive to achieve: atomicity, consistency, 
isolation and durability.  No database that fails to meet any of these 
four goals can be considered reliable."

Well, to chime in here, in a "friendly" competition / cooperation sort 
of way, the Pointrel Data Repository System,
   http://sourceforge.net/projects/pointrel/
while not quite an object database (and admittedly its case being 
easier) has a simple API in the bare minimum use case (it has more 
complex variants). Here is an example of its use (with fragments 
inspired in response to an earlier c.l.p poster's use case a few days ago):

   from pointrel20030812 import *

   # add a first attendant -- uses built in unique ID function
   # each change will be implicitely a seperate transaction
   attendantID = Pointrel_generateUniqueID()
   Pointrel_add("congress", attendantID, 'object type', 'user')
   Pointrel_add("congress", attendantID, 'name', 'Sir Galahad')

   # add a second attendant, this time as an atomic transaction
   attendantID = Pointrel_generateUniqueID()
   Pointrel_startTransaction()
   Pointrel_add("congress", attendantID, 'object type', 'user')
   Pointrel_add("congress", attendantID, 'name', 'Brian')
   Pointrel_finishTransaction()

In the first case, the changes are automatically made into transactions, 
in the second, they are lumped under the current transaction.

Note that Python objects could be added to the database, as in:

   Pointrel_add("test", 10, ["hello", "goodbye"], MyClass)

This simple API is made possible by two decisions:
* to have a version of the API function set which are named as module 
level globals and use a hidden repository (stored in _repository) which 
is defaulted in various ways when needed.
* to keep a flag in a repository of whether it is in a transaction or 
not, and if it isn't, to create a transaction on the fly (if an 
"implicit transactions allowed" option is set, which it is by default).

A more general use of the API allowing multiple repositories to be used 
by one application simultaneously is:

   repository = PointrelDataRepositorySystem(archiveName)
   repository.startTransaction()
   repository.add(context, a, b, c)
   repository.add(context, d, e, f)
   repository.finishTransaction()

The module level "Pointrel_xyz()" functions use these sorts of more 
general API calls behind the scenes.

Granted, the Pointrel System is essentially a single user single 
transaction system at the core. It (in theory, subject to bugs) supports 
atomicity (transactions), isolation (locking) and durability 
(logging&recovery). It only supports consistency by how applications use 
transactions as opposed to explicit constraints or rules maintained by 
the database, so one could argue it fails the ACID test there. (Although 
would any typical ODBMS pass consistency without extra code support? 
Does PyPerSyst have this as the database level?) And the Pointrel System 
doesn't attempt to hook into the Python language syntax, so it's task 
may be much easier for PyPerSyst's goals?

To be clear, I'm not holding this out as "Pointrel System great" and 
"PyPerSystem not so great", since obviously the two systems do different 
things, each have its own focus, your task is perhaps harder, I don't 
fully understand everything that is going on here in your design and 
requirements, etc. What I am trying to get at is more to challenge you 
(in a friendly way) to have a very simple API in a default case by 
throwing down a pseudo-gauntlet of a simpler system API. The Pointrel 
System has gone through years of permutation on the API (mainly just by 
me) to get to the conceptual simplicity it has. And of course, now I'm 
in the process of adding more complexity on top of it (but not in it) 
where I am running into more object persistance and interface issues 
(such as the ones PyPerSyst may already solve easily). So feel free to 
say I don't understand all the issues yet. Maybe I'll learn something. ;-)

In Smalltalk, typically persistant objects may get stored and retrieved 
as proxies, which is made possible by overriding the basic storage and 
retrieval methods which are all exposed etc. Maybe Python the language 
could do with more hooks for persistances as a PEP? I know there are 
some lower level hooks for access, I'm just wondering if they are enough 
for what you may want to do with PyPerSyst to make an elegant API for 
persistant objects (perhaps better unique ID support?), where you could 
then just go:

   import persistanceSystem import *
   foo = MyClass()
   PersistanceSystem_Wrap(foo)
   # the following defaults to a transaction
   foo.x = 10
   # this makes a two change transaction
   PersistanceSystem_StartTransaction()
   foo.y = 20
   foo.z = 20
   foo.info = "I am a 3D Point"
   PersistanceSystem_EndTransaction()
   # what happens to foo on garbage collection? It persists!
   ...
   # Other code in another program
   import persistanceSystem import *
   foo = PersistanceSystem_Query(x=10, y=20, z=30)
   print foo.info # prints --> "I am a 3D Point"

That MyClass instance called foo and the related variable changes gets 
stored in an ODBMS in transactions somewhere... Then I could do the same 
for the Pointrel System somehow using the same simple hooks.

I any case, if you can point out why such useage would be impossible 
using Python and some future version of PyPerSyst, we might be on to 
something interesting. I know in a typical Smalltalk I could easily do 
such a thing.  But then again, in most Smalltalks, 3/4 yields a fraction 
(not an int, and not a float), and when 3/4 in Smalltalk is multiplied 
by 4/3 you get 1 back again (as an int, not a rounded float), and Python 
still stuggles with some basic things like this (although Python has 
many other good qualities that more than make up for such weaknesses).

 >> So, what else would you like to have in a pure-Python ODBMS?

I do think PyPerSyst is a really cool concept (in memory use and disk 
checkpoints and a log). It reminds me a little of Gemstone (an ODBMS) 
for Smalltalk.

By the way, if you add support for the sorts of associative tuples with 
the Pointrel System is based on, efficiently managed, maybe I'll 
consider switching to using your system, if the API is simple enough. 
:-) Or, perhaps there is a way the Pointrel System can be extended to 
support what you might want to do (in the sense of transparent 
interaction with Python). In its use of the pickler, the Pointrel System 
does not keep a list of previously pickled object, so it can't 
transparently pickle objects that refer to previously pickled object in 
the repository, so that is one way that the Pointrel system can't do 
what your system does at all. (I'm not sure how to do that without like 
PyPerSyst keeping lots of previously pickled objects in memory at once 
for the Pickler to work with). Also, in the Pointrel System repositories 
are sort of on the fly made up of an arbitrary collection of archives 
where archives may be added and removed dynamically, so I don't quite 
begin to see to handle object persistance across a repository if 
subobjects are stored in different archives which are dropped out of the 
repository.

My biggest issue with OO databases (including "a Smalltalk image" for 
that matter) in general is that the definition of objects changes over 
time, and on a practical basis, it might be needed to support multiple 
definitions of a class with the same name simultaneously if supporting a 
broad range of applications and somehwo resolve version issues. The 
Pointrel System in itself doesn't solve that problem either, but it also 
doesn't have that problem built in at the core, since its main storage 
type is just an arbitrary binary string. I mainly added the Python 
object support just because "pickle" made it easy and fun to do the 
basics, and I thought that a limited level of transparent support might 
make it more appealing to Pythonistas and provide some extra easy 
expanability if people really wanted to easily store typed information 
as oposed to strings. (ALthough I think it could also bring headaches if 
people have PyPerSyst level expectations for object storage and 
retrieval when I support something more like a Newton soup entry..)

By the way, I like your overview of various related ODBMS projects here:
   http://www.orbtech.com/wiki/PythonPersistence
(maybe http://munkware.sourceforge.net/ might go there now?)
and your article at:
   http://www-106.ibm.com/developerworks/library/l-pypers.html

And I'm just starting to poke around with your PyCrust to see if it 
can't be used to support more Smalltalk like development of Python apps. 
As a hint as to what I'd like to do :-) I'm hoping to get a lot of 
mileage out of code like:
   newMethodSource = self.editText.GetValue()
   print newMethodSource
   self.expr = compile(newMethodSource, '<string>', 'exec')
   exec self.expr in self.__class__.__dict__
as opposed to reloading a whole file at once -- to support incremental 
development on live GUIs. I just need a good way to iterate through the 
fields of a GUI instance to rebind action methods to newer versions. (I 
discovered typical Python GUI toolkits have sort of same problem as when 
using Smalltalk blocks for GUI action code, so when you override a boudn 
method the old version is hung onto by the GUI event system since a 
method is referenced by pointer not by name, and I don't think Python 
has a "instance become: otherInstance" equivalent.) If it worked out 
well, such a system could then leverage the Pointrel System or PyPerSyst 
to provide version control at a fine grained level for method function 
definitions. If PyPerSyst was as transparent to use as outlined above, 
maybe it could then be used to store and retrieve hand built GUI 
instances with their hand built methods (sort of like in a Squeakish 
Smalltalk image with Morphic, but maybe better).

So anyway, yours in friendly coopetition. :-)

--Paul Fernhout
http://www.pointrel.org



-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----==  Over 100,000 Newsgroups - 19 Different Servers! =-----




More information about the Python-list mailing list