[Python-Dev] Re: GadflyDA in core? Or as add-on-product?

Tue, 21 Jan 2003 17:03:50 -0800

> From: Stuart Bishop
>
> On Wednesday, January 22, 2003, at 08:22  AM, Guido van Rossum wrote:
>
> >>>> My personal belief would be to include Gadfly in Python:
> >>>> 	- Provides a reason for the DB API docs to be merged into the
> >>>> 	  Python library reference
> >>>> 	- Gives Python relational DB stuff out of the box ala Java,
> >>>> 	  but with a working RDBMS as well ala nothing else I'm aware
> >>>> 	  of.
> >>>>    - Makes including GadflyDA in Zope 3 a trivial decision, since
> >>>>      its size would be negligable and the DA code itself is
> >>>>      already ZPL.
> >>>
> >>> Would you be willing to find out (from c.l.py) how much interest
> >>> there is in this?
> >>
> >> A fairly positive response from the DB SIG. The trick will be to fix
> >> the outstanding bugs or disable those features (losing the 'group
> >> by' and 'unique' SQL clauses), and to confirm and fix any departures
> >> from the DB-API 2.0 standard, as this would become a reference
> >> implementation of sorts.
> >>
> >> There is no permanent maintainer, as Richard Jones is in more of a
> >> caretaker role with the code. I'll volunteer to try and get the code
> >> into a Python release though.
> >>
> >> If fixes, documentation and tests can be organized by the end of
> >> January for alpha2, will this go out with Python 2.3 (assuming a
> >> signoff on quality by python-dev and the DB-SIG)? If not, Jim is
> >> back to deciding if he should include Gadfly with Zope3.
> >
> > Sorry for not responding before.  I'm open for doing this, but you
> > should probably probe python-dev next before you start a big coding
> > project.  How much C code is involved in Gadfly?  If it's a lot, I'm a
> > lot more reluctant, because C code usually requires much more
> > maintenance (rare is the C source file that doesn't have some hidden
> > platform dependency).
>
> Gadfly comes with kjbuckets, which is written in C. The rest is Python.
> Gadfly uses the included kjbuckets for storage if it is available, but
> happily runs without it with a performance hit. So Jython gets a
> RDBMS implementation too.

Interesting. I'm in the process of trying out Gadfly, PySQLite, and MetaKit
as embedded databases. For reference, the links are:

Gadfly
http://gadfly.sourceforge.net/

SQLite and PySQLite
http://www.hwaci.com/sw/sqlite/
http://pysqlite.sourceforge.net/

MetaKit, Mk4py, MkSQL
http://www.equi4.com/metakit/
http://www.equi4.com/metakit/python.html
http://www.mcmillan-inc.com/mksqlintro.html

All are embeddable databases, but they each have their pros and cons. I can
see how Gadfly would have a lot of appeal since it can be used as a pure
Python solution. The licensing for MetaKit probably makes it inappropriate
for the Python standard libs, but I'm sure that could be brought up with the
author. PySQLite seems to be the most mature (MetaKit users may disagree),
certainly SQLite is better documented, has a richer feature set, and as a
bonus the source code is in the public domain! PySQLite appears to be quite
fast.

http://www.hwaci.com/sw/sqlite/speed.html

Since it doesn't use a memory map like MetaKit, it should work equally well
with small and large data sets.

Anyway, I'm probably a month away from being able to present an adequate
comparison of using each for different relational datasets. One data set I'm
looking at is roughly 800MB of data, the other is only about 256KB and I'm
looking at the smaller one first since it also has a simpler table
structure.

I would be interested in seeing both Gadfly and PySQLite supported in the
standard libs. I'm guessing that Gadfly needs a lot of testing and probably
bug fixes to justify including it in the 2.3 standard libs.

ka