[DB-SIG] Re: [Webware-discuss] [ANN] SQLObject 0.1

Ian Bicking ianb@colorstudy.com
22 Oct 2002 13:35:46 -0500


On Tue, 2002-10-22 at 07:44, Kevin Jacobs wrote:
> On 22 Oct 2002, Ian Bicking wrote:
> > [...] And my vague impression is that PyDO avoids all this trouble by
> > using the dictionary interface, so it doesn't have to create methods for
> > each column -- I specifically didn't want to do something like that,
> > because I don't want the database columns to be distinguishable from other
> > methods and attributes.  Making that distinction, IMHO, is a significant
> > compromise of the whole point of a middle object.
> 
> I agree, though in my mind there are some additional constraints.  First, it
> is not desirable to pollute the namespace of your relational objects.  The
> ideal would be that no "other methods or attributes" are visible in such an
> object.  This does not imply that using a dictionary interface is a good
> thing, though it does simplify the design.  Object-like interfaces feel more
> natural, more Pythonic, and justify the extra effort to use an object
> namespace.

I don't know... I guess I feel otherwise.  In a practical example, I
have often put some data in the filesystem, and some in the database. 
For instance, I put images in the filesystem indexed to a row.  But I
don't see any reason calling code would need to know that, or
distinguish that from data kept in the database.  Similar with a lot of
derivative data, or data that has a calculated default.

> Second, minimizing the overhead required to retrieve a column value is an
> essential part of making an enterprise grade OR mapper.  

Retrieving a column value should be fairly quick -- surely not as fast
as custom C code, but I imagine fairly fast relative to the database
access.  We all know Python isn't the fastest, just usually fast enough
-- I think SQLObject should be fast enough as well.

> Plus, it is also
> desirable to encorce the data schema by disallowing the use of novel
> attributes that will not be persistent.  The property and slots features
> from Python 2.2+ provide both safety and performance.  There is an added
> bonus that each object will not allocate a per-instance dictionary, which
> makes a huge difference in the footprint and performance of applications
> that can hold many thousands of row instances.

SQLObject objects are somewhat heavy -- there's probably a half-dozen
instance variables in addition to one variable per column.  I wouldn't 
keep thousands of them around without some serious concern about the
performance.  But then again, I wouldn't want to keep thousands around
period.  It's uncommon to use a thousand rows at once (though common to
iterate through a thousand rows -- this is why we have generators). 
Right now SQLObject doesn't take advantage of expiration and generators,
but I would like to add that.  If you really do have tons of rows to
deal with, all at the same time, then SQLObject probably isn't the right
choice (at least for those tables).  You probably want to do as much in
SQL as possible in that case.

Anyway, SQLObject isn't a DBAPI wrapper like SQLDict or others.  It's
meant to make middle objects -- it's meant to be the *basis* for objects
with rich functionality.  If I never expected to add my own methods,
override behavior, or otherwise futz with the object, I would have used
something like SQLDict or your db_row -- but in my own experience I was
creating these richer objects anyway, and SQLObject grew out of that. 
Even though this is the first public release, I've been writing and
refactoring and rewriting this code for over a year.  This is something
that I have found not just useful, but compellingly useful in my
programming.  It's like when I was programming with CGI -- I kept
building mini frameworks without even trying.  

As far as safety and enforcing a schema -- I definitely don't agree with
that.  I don't want to enforce anything.  It's not Pythonic, but
moreover I don't think it's a good idea.  If a programmer wants to do
something stupid, that's their fault not mine.  But if a programmer
wants to do something smart that I didn't expect, and I keep them from
doing that, then it's my fault.  And if *I* want to do something smart
and I keep myself from doing it, then I'm creating unnecessary inner
conflict :)

> > Both MK and SQLObject currently only work with MySQL.  I hope to extend
> > SQLObject, and I think it should be very easy (actually, I hope others
> > will extend it, but I'll help them :).  There's really only one part of
> > SQLObject that need to be ported -- inserting a row, to be specific.
> 
> I'm happy to help, though I suggest you take the first steps by creating a
> backend abstraction layer.  Here are some things that such a layer would
> handle:

Maybe you can help me here -- I'm not terribly familiar with many
databases, and I don't know the degree to which they differ.

>   1) SQL literal representations and escaping (strings, identifiers, dates,
>      timestamps, blobs, etc.)

How much do these really vary?  I'd like to avoid a full-fledged layer
for this... right now I do checking for some of the reasonable builtin
types (and DateTime), and let certain objects determine for themselves
what their SQL should be.  I'm otherwise agnostic on other types, like
timestamps and blobs.  I'd really like One True SQL Quoting Function...
though some ways around that are occurring to me.

>   2) Inner/outer join syntax

I'm not even sure how this would relate to this wrapper -- right now the
only generated select statements fetch row IDs.  Selects definitely
could be optimized, but even then I'm not doing anything that fetches a
less-than-complete object, so I don't think joins would become involved.

>   3) Query limits

You mean, LIMIT on the end of a SELECT?  Again, SQLObject remains
agnostic -- you pass an object, it gets tacked on with the LIMIT, I pay
no attention to what kind of object it is.

>   4) Last-insertion-id requests

This being the obvious abstraction issue (why couldn't they have
standardized this?  Stupid, stupid, stupid...)

>   5) Large binary object interface

I avoid large binary objects :)  This would be something I'd feel
comfortable leaving out -- I'd rather let the programmer understand how
SQLObject is working, and code this on their own.  This is an advantage
of having extensible objects -- you can write your own SQL and do your
own thing, and no one needs to be the wiser.  But I don't really know
what those interfaces look like -- it might be easy to implement for all
I know.

  Ian