
On Wednesday, January 22, 2003, at 08:22 AM, Guido van Rossum wrote:
Gadfly comes with kjbuckets, which is written in C. The rest is Python. Gadfly uses the included kjbuckets for storage if it is available, but happily runs without it with a performance hit. So Jython gets a RDBMS implementation too. -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

[Stuart Bishop]
So my first question is what is the license on Gadfly? I assume it is compatible with going into Python, but I thought I would ask. Next, how much of a performance hit is there without kjbuckets? I am with Guido with wanting to minimize the amount of C code put into the libraries where there is no requirement for it. And if there is a decent hit what would it take to code up something in Python to replace it? We could leave it as an option to use kjbuckets if we want. And if taking out kjbuckets is unreasonble, what license is it under? I personally would love to have an actual DB in the stdlib so if these questions get positive answers I am +1. -Brett

On Wednesday, January 22, 2003, at 11:21 AM, Brett Cannon wrote:
Use granted for any purpose without fee, provided the Copyright and permission notices appear in all copies and supporting documentation.
The fallback already is a version of kjbuckets written in pure Python. So the build process simply needs to keep going if kjbucketsmodule.c doesn't build. The regression tests during alpha and beta releases should tell us if we need to switch off kjbuckets on certain platforms, although it already has had a decent work out since Gadfly has been part of Zope since at least 2.0 (3+ years?)
And if taking out kjbuckets is unreasonble, what license is it under?
Same licence.
I personally would love to have an actual DB in the stdlib so if these questions get positive answers I am +1.
-- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

This is actually a pretty serious burden -- the list of licenses we have to keep around in all copies and docs keeps growing. :-( Maybe Aaron will assign the code to the PSF? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday, January 22, 2003, at 12:08 PM, Guido van Rossum wrote:
Hmm... I assumed that I would simply cut & paste the agreement into the Gadfly documentation I'd have to adapt for the Library reference. Looks like Python licence but Aaron wants to be identified as the author ('CV-ware'). cc:'d to Aaron for the horses mouth opinion if the email address I have is still valid. From COPYRIGHT.txt: The gadfly and kjbuckets source is copyrighted, but you can freely use and copy it as long as you don't change or remove the copyright: Copyright Aaron Robert Watters, 1994 All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. AARON ROBERT WATTERS DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL AARON ROBERT WATTERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

On Wednesday, January 22, 2003, at 12:26 PM, Stuart Bishop wrote:
cc:'d to Aaron for the horses mouth opinion if the email address I have is still valid.
After remembering where Aaron now works, and emailing the following to his *correct* address, I got a 'Yes, no problem'. msg_to_aaron = """ I don't know if you saw this thread or not on Python-dev: http://mail.python.org/pipermail/python-dev/2003-January/032295.html Looks like Gadfly can go into the core Python distribution. One of the outstanding issues is that of the licence, which concerned Guido: http://mail.python.org/pipermail/python-dev/2003-January/032305.html Would you be interested in allowing the PSF to distribute Gadfly under its own licence to simplify things? """ So it looks like everything is fine, at least after we can extricate the keyboard Anthony managed to embed in his forehead. Speaking of which, should we proceed assuming nothing will be included until we have a a replacement for kjbuckets? If this isn't a requirement and we can proceed with the existing C code (with fallback to pure python), then we might be able to get this into 2.3 alpha 2. Otherwise I can give in to my inherent laziness and assume 2.3.1 or 2.4. Do I need a PEP, given that adding this as a module seems generally accepted both in python-dev and the DB SIG? -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

[Stuart Bishop]
If I remember correctly Guido said it is a requirement to replace kjbuckets with something else (Python version is acceptable). I know I personally would vote against letting it in with kjbuckets in C.
Only really needed if you need to convince Guido that it is a good idea or get enough of a following to get Guido to overrule himself. -Brett

No PEP is needed, but I'd like to understand more of the mechanics of adding this to the distribution. I've got no problem with adding more Python code to the standard library, but (as Brett mentioned) I'd like to keep the kjbuckets C code out unless we have a volunteer to both clean it up and maintain it. Also, I just looked at the copy of gadfly that's part of Zope, and it is about 15,000 lines! (And that's only Python code -- no C code included, nor docs.) Do we really need all that? Who is going to maintain it? Is somebody going to convert the gadfly docs (assuming they exist) into LaTeX? Or is it just going to be an undocumented pile of code that only people who happen to already know how to use it can really use? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 11 Feb 2003 12:40 pm, Guido van Rossum wrote:
The C code will go away soon, thanks to Anthony's efforts in the kjbuckets python module (the conversion to the new sets implementation, amongst other enhacements).
The cleaned up version in the sourceforge project is 11k. We may be able to remove the 1.2kloc parser builder.
Who is going to maintain it?
I have no answer for this. The sourceforge project has a number of maintainers, but there are old outstanding bugs which have had no attention (some even have patches). I'm afraid it's at the bottom of my priority list at present.
Is somebody going to convert the gadfly docs (assuming they exist) into LaTeX?
I converted them to ReST as part of my cleanup, so a docutils writer which writes the python doc LaTeX format _should_ be possible (it'd be a nice-to-have for Python documentation regardless :)
It is documented already. Stuart is looking at implemeting the DB-API 2.0 interface for it, so the doc will need updating at that point. That's not a mammoth task though. Richard

[Richard Jones]
That's good news.
So that's the version that is being considered for inclusion in Python?
Mine too. Unless someone volunteers, I'm strongly against adopting this code -- we can't have decaying code in the core distribution.
Agreed.
Great. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 2003-02-11 at 10:14, Guido van Rossum wrote:
We probably do more harm than good if we include code that isn't being actively maintained. It's inclusion in the std library creates the impression that it is bullet-proof and well-maintained. It feels like another asyncore: a body of code the core developers don't understand tool well but will be forced to learn through years of fixing subtle bugs. It would certainly be unusual and kind of neat to ship an SQL database with the standard library, but I doubt many people are expecting to find one there. SourceForge seems like a good home. Jeremy

Hm. I've not reviewed kjbuckets myself, but I've heard it's some of the hairiest C code ever written. The problem with putting that in the Python distribution is that we end up having to maintain it, whether we want to or not. So I'm -1 on adding kjbuckets. --Guido van Rossum (home page: http://www.python.org/~guido/)

Interesting. I'm in the process of trying out Gadfly, PySQLite, and MetaKit as embedded databases. For reference, the links are: Gadfly http://gadfly.sourceforge.net/ SQLite and PySQLite http://www.hwaci.com/sw/sqlite/ http://pysqlite.sourceforge.net/ MetaKit, Mk4py, MkSQL http://www.equi4.com/metakit/ http://www.equi4.com/metakit/python.html http://www.mcmillan-inc.com/mksqlintro.html All are embeddable databases, but they each have their pros and cons. I can see how Gadfly would have a lot of appeal since it can be used as a pure Python solution. The licensing for MetaKit probably makes it inappropriate for the Python standard libs, but I'm sure that could be brought up with the author. PySQLite seems to be the most mature (MetaKit users may disagree), certainly SQLite is better documented, has a richer feature set, and as a bonus the source code is in the public domain! PySQLite appears to be quite fast. http://www.hwaci.com/sw/sqlite/speed.html Since it doesn't use a memory map like MetaKit, it should work equally well with small and large data sets. Anyway, I'm probably a month away from being able to present an adequate comparison of using each for different relational datasets. One data set I'm looking at is roughly 800MB of data, the other is only about 256KB and I'm looking at the smaller one first since it also has a simpler table structure. I would be interested in seeing both Gadfly and PySQLite supported in the standard libs. I'm guessing that Gadfly needs a lot of testing and probably bug fixes to justify including it in the 2.3 standard libs. ka

On Wed, 22 Jan 2003 12:03 pm, Kevin Altis wrote:
Gadfly has the advantage that any marshallable Python object may be stored with no mess, no fuss. Sqlite is restricted to only storing strings. Metakit supports a variety of data types, but no explicit NULL. Actually, the three support wildly different types of "unset" values: gadfly: python's None sqlite: sql NULL (and all its quirks ;) metakit: no support Gadfly has the additional benefit that any Python object may support its View interface, and thus participate in SQL queries. Pretty powerful stuff.
Since it doesn't use a memory map like MetaKit, it should work equally well with small and large data sets.
I'm not sure this is a reasonable statement to make.
Gadfly has outstanding bugs (see the sourceforge bug tracker). It has a suite of unit tests, but these are far from complete. It needs volunteers :) It'd also be nice for gadfly to support SQL "LIKE" expressions, but that also requires work under the hood by some generous volunteer :) Richard

On Wed, 22 Jan 2003 12:03 pm, Kevin Altis wrote:
Oh, and one other thing: from way back at the start of this discussion, it was decided that performance was not going to be a major deciding factor. Sure, we can make sure the perfomance doesn't suck, but if you want a large database, use a real database engine :) Richard

On Wednesday, January 22, 2003, at 12:03 PM, Kevin Altis wrote:
MetaKit and PySQLite were brought up when discussing this on the DB-SIG mailing list. However, the major problem is keeping releases of these third party tools in sync with Python releases. The advantage of Gadfly is that it has been in maintenance only mode for a few years now, and can happily be uprooted and replanted in the Python CVS repository. -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

[Stuart Bishop]
So my first question is what is the license on Gadfly? I assume it is compatible with going into Python, but I thought I would ask. Next, how much of a performance hit is there without kjbuckets? I am with Guido with wanting to minimize the amount of C code put into the libraries where there is no requirement for it. And if there is a decent hit what would it take to code up something in Python to replace it? We could leave it as an option to use kjbuckets if we want. And if taking out kjbuckets is unreasonble, what license is it under? I personally would love to have an actual DB in the stdlib so if these questions get positive answers I am +1. -Brett

On Wednesday, January 22, 2003, at 11:21 AM, Brett Cannon wrote:
Use granted for any purpose without fee, provided the Copyright and permission notices appear in all copies and supporting documentation.
The fallback already is a version of kjbuckets written in pure Python. So the build process simply needs to keep going if kjbucketsmodule.c doesn't build. The regression tests during alpha and beta releases should tell us if we need to switch off kjbuckets on certain platforms, although it already has had a decent work out since Gadfly has been part of Zope since at least 2.0 (3+ years?)
And if taking out kjbuckets is unreasonble, what license is it under?
Same licence.
I personally would love to have an actual DB in the stdlib so if these questions get positive answers I am +1.
-- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

This is actually a pretty serious burden -- the list of licenses we have to keep around in all copies and docs keeps growing. :-( Maybe Aaron will assign the code to the PSF? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Wednesday, January 22, 2003, at 12:08 PM, Guido van Rossum wrote:
Hmm... I assumed that I would simply cut & paste the agreement into the Gadfly documentation I'd have to adapt for the Library reference. Looks like Python licence but Aaron wants to be identified as the author ('CV-ware'). cc:'d to Aaron for the horses mouth opinion if the email address I have is still valid. From COPYRIGHT.txt: The gadfly and kjbuckets source is copyrighted, but you can freely use and copy it as long as you don't change or remove the copyright: Copyright Aaron Robert Watters, 1994 All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies and that both that copyright notice and this permission notice appear in supporting documentation. AARON ROBERT WATTERS DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL AARON ROBERT WATTERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

On Wednesday, January 22, 2003, at 12:26 PM, Stuart Bishop wrote:
cc:'d to Aaron for the horses mouth opinion if the email address I have is still valid.
After remembering where Aaron now works, and emailing the following to his *correct* address, I got a 'Yes, no problem'. msg_to_aaron = """ I don't know if you saw this thread or not on Python-dev: http://mail.python.org/pipermail/python-dev/2003-January/032295.html Looks like Gadfly can go into the core Python distribution. One of the outstanding issues is that of the licence, which concerned Guido: http://mail.python.org/pipermail/python-dev/2003-January/032305.html Would you be interested in allowing the PSF to distribute Gadfly under its own licence to simplify things? """ So it looks like everything is fine, at least after we can extricate the keyboard Anthony managed to embed in his forehead. Speaking of which, should we proceed assuming nothing will be included until we have a a replacement for kjbuckets? If this isn't a requirement and we can proceed with the existing C code (with fallback to pure python), then we might be able to get this into 2.3 alpha 2. Otherwise I can give in to my inherent laziness and assume 2.3.1 or 2.4. Do I need a PEP, given that adding this as a module seems generally accepted both in python-dev and the DB SIG? -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/

[Stuart Bishop]
If I remember correctly Guido said it is a requirement to replace kjbuckets with something else (Python version is acceptable). I know I personally would vote against letting it in with kjbuckets in C.
Only really needed if you need to convince Guido that it is a good idea or get enough of a following to get Guido to overrule himself. -Brett

No PEP is needed, but I'd like to understand more of the mechanics of adding this to the distribution. I've got no problem with adding more Python code to the standard library, but (as Brett mentioned) I'd like to keep the kjbuckets C code out unless we have a volunteer to both clean it up and maintain it. Also, I just looked at the copy of gadfly that's part of Zope, and it is about 15,000 lines! (And that's only Python code -- no C code included, nor docs.) Do we really need all that? Who is going to maintain it? Is somebody going to convert the gadfly docs (assuming they exist) into LaTeX? Or is it just going to be an undocumented pile of code that only people who happen to already know how to use it can really use? --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 11 Feb 2003 12:40 pm, Guido van Rossum wrote:
The C code will go away soon, thanks to Anthony's efforts in the kjbuckets python module (the conversion to the new sets implementation, amongst other enhacements).
The cleaned up version in the sourceforge project is 11k. We may be able to remove the 1.2kloc parser builder.
Who is going to maintain it?
I have no answer for this. The sourceforge project has a number of maintainers, but there are old outstanding bugs which have had no attention (some even have patches). I'm afraid it's at the bottom of my priority list at present.
Is somebody going to convert the gadfly docs (assuming they exist) into LaTeX?
I converted them to ReST as part of my cleanup, so a docutils writer which writes the python doc LaTeX format _should_ be possible (it'd be a nice-to-have for Python documentation regardless :)
It is documented already. Stuart is looking at implemeting the DB-API 2.0 interface for it, so the doc will need updating at that point. That's not a mammoth task though. Richard

[Richard Jones]
That's good news.
So that's the version that is being considered for inclusion in Python?
Mine too. Unless someone volunteers, I'm strongly against adopting this code -- we can't have decaying code in the core distribution.
Agreed.
Great. --Guido van Rossum (home page: http://www.python.org/~guido/)

On Tue, 2003-02-11 at 10:14, Guido van Rossum wrote:
We probably do more harm than good if we include code that isn't being actively maintained. It's inclusion in the std library creates the impression that it is bullet-proof and well-maintained. It feels like another asyncore: a body of code the core developers don't understand tool well but will be forced to learn through years of fixing subtle bugs. It would certainly be unusual and kind of neat to ship an SQL database with the standard library, but I doubt many people are expecting to find one there. SourceForge seems like a good home. Jeremy

Hm. I've not reviewed kjbuckets myself, but I've heard it's some of the hairiest C code ever written. The problem with putting that in the Python distribution is that we end up having to maintain it, whether we want to or not. So I'm -1 on adding kjbuckets. --Guido van Rossum (home page: http://www.python.org/~guido/)

Interesting. I'm in the process of trying out Gadfly, PySQLite, and MetaKit as embedded databases. For reference, the links are: Gadfly http://gadfly.sourceforge.net/ SQLite and PySQLite http://www.hwaci.com/sw/sqlite/ http://pysqlite.sourceforge.net/ MetaKit, Mk4py, MkSQL http://www.equi4.com/metakit/ http://www.equi4.com/metakit/python.html http://www.mcmillan-inc.com/mksqlintro.html All are embeddable databases, but they each have their pros and cons. I can see how Gadfly would have a lot of appeal since it can be used as a pure Python solution. The licensing for MetaKit probably makes it inappropriate for the Python standard libs, but I'm sure that could be brought up with the author. PySQLite seems to be the most mature (MetaKit users may disagree), certainly SQLite is better documented, has a richer feature set, and as a bonus the source code is in the public domain! PySQLite appears to be quite fast. http://www.hwaci.com/sw/sqlite/speed.html Since it doesn't use a memory map like MetaKit, it should work equally well with small and large data sets. Anyway, I'm probably a month away from being able to present an adequate comparison of using each for different relational datasets. One data set I'm looking at is roughly 800MB of data, the other is only about 256KB and I'm looking at the smaller one first since it also has a simpler table structure. I would be interested in seeing both Gadfly and PySQLite supported in the standard libs. I'm guessing that Gadfly needs a lot of testing and probably bug fixes to justify including it in the 2.3 standard libs. ka

On Wed, 22 Jan 2003 12:03 pm, Kevin Altis wrote:
Gadfly has the advantage that any marshallable Python object may be stored with no mess, no fuss. Sqlite is restricted to only storing strings. Metakit supports a variety of data types, but no explicit NULL. Actually, the three support wildly different types of "unset" values: gadfly: python's None sqlite: sql NULL (and all its quirks ;) metakit: no support Gadfly has the additional benefit that any Python object may support its View interface, and thus participate in SQL queries. Pretty powerful stuff.
Since it doesn't use a memory map like MetaKit, it should work equally well with small and large data sets.
I'm not sure this is a reasonable statement to make.
Gadfly has outstanding bugs (see the sourceforge bug tracker). It has a suite of unit tests, but these are far from complete. It needs volunteers :) It'd also be nice for gadfly to support SQL "LIKE" expressions, but that also requires work under the hood by some generous volunteer :) Richard

On Wed, 22 Jan 2003 12:03 pm, Kevin Altis wrote:
Oh, and one other thing: from way back at the start of this discussion, it was decided that performance was not going to be a major deciding factor. Sure, we can make sure the perfomance doesn't suck, but if you want a large database, use a real database engine :) Richard

On Wednesday, January 22, 2003, at 12:03 PM, Kevin Altis wrote:
MetaKit and PySQLite were brought up when discussing this on the DB-SIG mailing list. However, the major problem is keeping releases of these third party tools in sync with Python releases. The advantage of Gadfly is that it has been in maintenance only mode for a few years now, and can happily be uprooted and replanted in the Python CVS repository. -- Stuart Bishop <zen@shangri-la.dropbear.id.au> http://shangri-la.dropbear.id.au/
participants (6)
-
Brett Cannon
-
Guido van Rossum
-
Jeremy Hylton
-
Kevin Altis
-
Richard Jones
-
Stuart Bishop