Object Database (ODBMS) for Python
Paul D. Fernhout
pdfernhout at kurtz-fernhout.com
Fri Aug 29 12:11:51 EDT 2003
Patrick K. O'Brien wrote:
> Jeremy Bowers <jerf at jerf.org> writes:
>>So why *isn't* it a transaction? Unless you have a good reason not
>>to, I'd suggest automatically "coercing" that into a transaction
>>instead of throwing an error.
>
> These are some of my reasons: 1) every transaction gets pickled and
> logged before executed, so that the database can recover from a crash,
> 2) most of the other cool features depend on mutations passing through
> the extent manager for each class, 3) transparent transactions only
> seem like a good idea, 4) security is hard to enforce without explicit
> boundaries (read the Twisted docs regarding Perspective Broker), 5)
> explicit is better than implicit, especially when valuable persistent
> data is involved.
>
> [snip]
>
>>(I'm only really entering my maturity (IMHO) as a software engineer,
>>but one of my rules of thumb for developing software for other
>>people to use is that the API can ***never*** be too easy.
>
> If you can figure out a way to have transparent transactions, without
> giving up on any ACID properties considered mandatory for a DBMS, I
> would love to hear about it. Have you worked with any other object
> databases?
To cite:
http://databases.about.com/library/weekly/aa120102a.htm
"The ACID model is one of the oldest and most important concepts of
database theory. It sets forward four goals that every database
management system must strive to achieve: atomicity, consistency,
isolation and durability. No database that fails to meet any of these
four goals can be considered reliable."
Well, to chime in here, in a "friendly" competition / cooperation sort
of way, the Pointrel Data Repository System,
http://sourceforge.net/projects/pointrel/
while not quite an object database (and admittedly its case being
easier) has a simple API in the bare minimum use case (it has more
complex variants). Here is an example of its use (with fragments
inspired in response to an earlier c.l.p poster's use case a few days ago):
from pointrel20030812 import *
# add a first attendant -- uses built in unique ID function
# each change will be implicitely a seperate transaction
attendantID = Pointrel_generateUniqueID()
Pointrel_add("congress", attendantID, 'object type', 'user')
Pointrel_add("congress", attendantID, 'name', 'Sir Galahad')
# add a second attendant, this time as an atomic transaction
attendantID = Pointrel_generateUniqueID()
Pointrel_startTransaction()
Pointrel_add("congress", attendantID, 'object type', 'user')
Pointrel_add("congress", attendantID, 'name', 'Brian')
Pointrel_finishTransaction()
In the first case, the changes are automatically made into transactions,
in the second, they are lumped under the current transaction.
Note that Python objects could be added to the database, as in:
Pointrel_add("test", 10, ["hello", "goodbye"], MyClass)
This simple API is made possible by two decisions:
* to have a version of the API function set which are named as module
level globals and use a hidden repository (stored in _repository) which
is defaulted in various ways when needed.
* to keep a flag in a repository of whether it is in a transaction or
not, and if it isn't, to create a transaction on the fly (if an
"implicit transactions allowed" option is set, which it is by default).
A more general use of the API allowing multiple repositories to be used
by one application simultaneously is:
repository = PointrelDataRepositorySystem(archiveName)
repository.startTransaction()
repository.add(context, a, b, c)
repository.add(context, d, e, f)
repository.finishTransaction()
The module level "Pointrel_xyz()" functions use these sorts of more
general API calls behind the scenes.
Granted, the Pointrel System is essentially a single user single
transaction system at the core. It (in theory, subject to bugs) supports
atomicity (transactions), isolation (locking) and durability
(logging&recovery). It only supports consistency by how applications use
transactions as opposed to explicit constraints or rules maintained by
the database, so one could argue it fails the ACID test there. (Although
would any typical ODBMS pass consistency without extra code support?
Does PyPerSyst have this as the database level?) And the Pointrel System
doesn't attempt to hook into the Python language syntax, so it's task
may be much easier for PyPerSyst's goals?
To be clear, I'm not holding this out as "Pointrel System great" and
"PyPerSystem not so great", since obviously the two systems do different
things, each have its own focus, your task is perhaps harder, I don't
fully understand everything that is going on here in your design and
requirements, etc. What I am trying to get at is more to challenge you
(in a friendly way) to have a very simple API in a default case by
throwing down a pseudo-gauntlet of a simpler system API. The Pointrel
System has gone through years of permutation on the API (mainly just by
me) to get to the conceptual simplicity it has. And of course, now I'm
in the process of adding more complexity on top of it (but not in it)
where I am running into more object persistance and interface issues
(such as the ones PyPerSyst may already solve easily). So feel free to
say I don't understand all the issues yet. Maybe I'll learn something. ;-)
In Smalltalk, typically persistant objects may get stored and retrieved
as proxies, which is made possible by overriding the basic storage and
retrieval methods which are all exposed etc. Maybe Python the language
could do with more hooks for persistances as a PEP? I know there are
some lower level hooks for access, I'm just wondering if they are enough
for what you may want to do with PyPerSyst to make an elegant API for
persistant objects (perhaps better unique ID support?), where you could
then just go:
import persistanceSystem import *
foo = MyClass()
PersistanceSystem_Wrap(foo)
# the following defaults to a transaction
foo.x = 10
# this makes a two change transaction
PersistanceSystem_StartTransaction()
foo.y = 20
foo.z = 20
foo.info = "I am a 3D Point"
PersistanceSystem_EndTransaction()
# what happens to foo on garbage collection? It persists!
...
# Other code in another program
import persistanceSystem import *
foo = PersistanceSystem_Query(x=10, y=20, z=30)
print foo.info # prints --> "I am a 3D Point"
That MyClass instance called foo and the related variable changes gets
stored in an ODBMS in transactions somewhere... Then I could do the same
for the Pointrel System somehow using the same simple hooks.
I any case, if you can point out why such useage would be impossible
using Python and some future version of PyPerSyst, we might be on to
something interesting. I know in a typical Smalltalk I could easily do
such a thing. But then again, in most Smalltalks, 3/4 yields a fraction
(not an int, and not a float), and when 3/4 in Smalltalk is multiplied
by 4/3 you get 1 back again (as an int, not a rounded float), and Python
still stuggles with some basic things like this (although Python has
many other good qualities that more than make up for such weaknesses).
>> So, what else would you like to have in a pure-Python ODBMS?
I do think PyPerSyst is a really cool concept (in memory use and disk
checkpoints and a log). It reminds me a little of Gemstone (an ODBMS)
for Smalltalk.
By the way, if you add support for the sorts of associative tuples with
the Pointrel System is based on, efficiently managed, maybe I'll
consider switching to using your system, if the API is simple enough.
:-) Or, perhaps there is a way the Pointrel System can be extended to
support what you might want to do (in the sense of transparent
interaction with Python). In its use of the pickler, the Pointrel System
does not keep a list of previously pickled object, so it can't
transparently pickle objects that refer to previously pickled object in
the repository, so that is one way that the Pointrel system can't do
what your system does at all. (I'm not sure how to do that without like
PyPerSyst keeping lots of previously pickled objects in memory at once
for the Pickler to work with). Also, in the Pointrel System repositories
are sort of on the fly made up of an arbitrary collection of archives
where archives may be added and removed dynamically, so I don't quite
begin to see to handle object persistance across a repository if
subobjects are stored in different archives which are dropped out of the
repository.
My biggest issue with OO databases (including "a Smalltalk image" for
that matter) in general is that the definition of objects changes over
time, and on a practical basis, it might be needed to support multiple
definitions of a class with the same name simultaneously if supporting a
broad range of applications and somehwo resolve version issues. The
Pointrel System in itself doesn't solve that problem either, but it also
doesn't have that problem built in at the core, since its main storage
type is just an arbitrary binary string. I mainly added the Python
object support just because "pickle" made it easy and fun to do the
basics, and I thought that a limited level of transparent support might
make it more appealing to Pythonistas and provide some extra easy
expanability if people really wanted to easily store typed information
as oposed to strings. (ALthough I think it could also bring headaches if
people have PyPerSyst level expectations for object storage and
retrieval when I support something more like a Newton soup entry..)
By the way, I like your overview of various related ODBMS projects here:
http://www.orbtech.com/wiki/PythonPersistence
(maybe http://munkware.sourceforge.net/ might go there now?)
and your article at:
http://www-106.ibm.com/developerworks/library/l-pypers.html
And I'm just starting to poke around with your PyCrust to see if it
can't be used to support more Smalltalk like development of Python apps.
As a hint as to what I'd like to do :-) I'm hoping to get a lot of
mileage out of code like:
newMethodSource = self.editText.GetValue()
print newMethodSource
self.expr = compile(newMethodSource, '<string>', 'exec')
exec self.expr in self.__class__.__dict__
as opposed to reloading a whole file at once -- to support incremental
development on live GUIs. I just need a good way to iterate through the
fields of a GUI instance to rebind action methods to newer versions. (I
discovered typical Python GUI toolkits have sort of same problem as when
using Smalltalk blocks for GUI action code, so when you override a boudn
method the old version is hung onto by the GUI event system since a
method is referenced by pointer not by name, and I don't think Python
has a "instance become: otherInstance" equivalent.) If it worked out
well, such a system could then leverage the Pointrel System or PyPerSyst
to provide version control at a fine grained level for method function
definitions. If PyPerSyst was as transparent to use as outlined above,
maybe it could then be used to store and retrieve hand built GUI
instances with their hand built methods (sort of like in a Squeakish
Smalltalk image with Morphic, but maybe better).
So anyway, yours in friendly coopetition. :-)
--Paul Fernhout
http://www.pointrel.org
-----= Posted via Newsfeeds.Com, Uncensored Usenet News =-----
http://www.newsfeeds.com - The #1 Newsgroup Service in the World!
-----== Over 100,000 Newsgroups - 19 Different Servers! =-----
More information about the Python-list
mailing list