Re: [Python-Dev] [issue3769] Deprecate bsddb for removal in 3.0
I think this should be deferred to Py3.1. This decision was not widely discussed and I think it likely that some users will be surprised and dismayed. The release candidate seems to be the wrong time to yank this out (in part because of the surprise factor) and in part because I think the change silently affects shelve performance so that the impact may be significantly negative but not readily apparent. We don't have to take this out. So why do it hastily at the last minute and without some discussion on comp.lang.python at least. If it were any other release, we would have disciplined ourselves to deprecate first and remove a generation or two later. Also, the reason for removal may yet disappear if jcrea steps in an continues to make updates. Also, the referenced note ( http://mail.python.org/pipermail/python-dev/2008-July/081379.html ) say to "start end-of-lifing it" which I took to mean deprecate rather than remove during a release candidate. Raymond ----- Original Message ----- From: "Benjamin Peterson" <report@bugs.python.org> To: <python-bugs-list@python.org> Sent: Wednesday, September 03, 2008 4:32 PM Subject: [issue3769] Deprecate bsddb for removal in 3.0
Benjamin Peterson <musiccomposition@gmail.com> added the comment:
Also see this: http://mail.python.org/pipermail/python-3000/2008-September/014712.html
_______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue3769> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/python%40rcn.com
On Wed, Sep 3, 2008 at 4:41 PM, Raymond Hettinger <python@rcn.com> wrote:
I think this should be deferred to Py3.1. This decision was not widely discussed and I think it likely that some users will be surprised and dismayed.
Perhaps, but that could be said about almost any module that has been removed through the stdlib reorg.
The release candidate seems to be the wrong time to yank this out (in part because of the surprise factor) and in part because I think the change silently affects shelve performance so that the impact may be significantly negative but not readily apparent.
We don't have to take this out.
We don't have to remove anything that has gone through the stdlib reorg, so that is not a solid argument.
So why do it hastily at the last minute and without some discussion on comp.lang.python at least.
It isn't being done "hastily"; this has been planned for a while. People have just been too busy to get around to it. And we are not changing any semantics or removing something from the language which is what I view as what you don't change in an rc. So this might come down to a different opinion of what one can do during an rc.
If it were any other release, we would have disciplined ourselves to deprecate first and remove a generation or two later.
We are deprecating first in 2.6.
Also, the reason for removal may yet disappear if jcrea steps in an continues to make updates.
OK, but none of his changes have received a code review, so if we are going to go down the whole "disciplined" route about it being an rc then we should back out all of Jesus' changes for both 2.6 and 3.0, which puts us back to the same instability issues.
Also, the referenced note ( http://mail.python.org/pipermail/python-dev/2008-July/081379.html ) say to "start end-of-lifing it" which I took to mean deprecate rather than remove during a release candidate.
Well, it was in the PEP before beta2 even went out the door. -Brett
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brett Cannon wrote:
Also, the reason for removal may yet disappear if jcrea steps in an continues to make updates.
OK, but none of his changes have received a code review, so if we are going to go down the whole "disciplined" route about it being an rc then we should back out all of Jesus' changes for both 2.6 and 3.0, which puts us back to the same instability issues.
I was wondering if somebody could write a "TO DO" list of things need to keep bsddb in the stdlib. Then I could work to trying to comply :). Yes, we are all very busy guys, but still... - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL/kPZlgi5GaxT1NAQLu4AP/VSHPYOCQgQYFJsdi2MWXBpyY7TyC5XgT Ks2uilXru/hsKQcegn8G6z/53Bt0Uu+oXZSQaZ51V8VnwDXEqaZ+GnKK+S2ky9m0 yomgMlKIZZJsOVd6X4HbLtrVYVKX8wQ224X/yCkw27OLzLIE9IDbUzEjC3+/A7mD 9IJu3B6IaLA= =ZA8p -----END PGP SIGNATURE-----
On Thu, Sep 4, 2008 at 6:35 AM, Jesus Cea <jcea@jcea.es> wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Brett Cannon wrote:
Also, the reason for removal may yet disappear if jcrea steps in an continues to make updates.
OK, but none of his changes have received a code review, so if we are going to go down the whole "disciplined" route about it being an rc then we should back out all of Jesus' changes for both 2.6 and 3.0, which puts us back to the same instability issues.
I was wondering if somebody could write a "TO DO" list of things need to keep bsddb in the stdlib. Then I could work to trying to comply :).
[Guido already made his public statement in support of removing pybsddb from 3.0, so this is more just for general benefit of Jesus to know why I think this all happened; I don't view these as points to argue over] * Follow python-dev practices. The biggest example of this was checking in code during an rc release cycle without code review. That was stated on python-committers which you should be subscribed to and paying attention to. * Maintain bsddb in Python and cut external releases separately. That would help make bsddb feel more like a stdlib thing instead of something that just gets dumped in our lap when we get close to a release. * Stay on top of the buildbots. test_bsddb has been such a consistent failure on the buildbots that it has left a very sour taste in the mouths of many core developers over the years (and I mean years; Pythonlabs folks are saying how much they remember the bindings being unstable back in the day). * Convince some folks that Sleepycat is actually doing a decent job now. As I believe Fred mentioned and you pointed out with the 4.7.0 release, Sleepycat does not always do solid releases. * Get another committer to help you maintain the code. When Gregory stepped down from maintaining bsddb, the code languished with its traditionally flaky tests until you stepped forward. That suggests to me that no one really wants to maintain that code but you. Sure, people want the code to be there, but "want" does not translate to man-hours to keep the code in good shape.
Yes, we are all very busy guys, but still...
Yes, we are all busy, including you. And that is what makes bsddb the largest maintenance headache in the stdlib; you are a single point of failure for a chunk of code that has garnered a reputation over the years as being flaky. And I realize the reputation is not your fault, Jesus. And I understand people wanting bsddb to be there. But from a core developer's POV, I want to keep the stdlib to code that at least a couple of core developers would be willing to work on if a bug was reported in the issue tracker; bsddb has not shown to be such code base. And just so people know, I hear the argument about keeping bsddb in 3.0 and then ripping it out in 3.1, but I'm cynical when it comes to python-dev, so I see that as a potential ploy to keep the code in and then have a year or so to argue about this all over again with no change on either side. Another thing to keep in mind with the whole shelve/dbm.any argument is that for 3.1 there is nothing saying we can't change shelve and the dbm package to allow 3rd-party code to register with the dbm package such that bsddb can be used as needed behind the scenes. -Brett
On Thu, Sep 4, 2008 at 11:30 AM, Brett Cannon <brett@python.org> wrote:
Another thing to keep in mind with the whole shelve/dbm.any argument is that for 3.1 there is nothing saying we can't change shelve and the dbm package to allow 3rd-party code to register with the dbm package such that bsddb can be used as needed behind the scenes.
-Brett
Exactly. That is what I think should really happen here.
On Sun, Sep 07, 2008 at 02:35:58PM -0700, Gregory P. Smith wrote:
On Thu, Sep 4, 2008 at 11:30 AM, Brett Cannon <brett@python.org> wrote:
Another thing to keep in mind with the whole shelve/dbm.any argument is that for 3.1 there is nothing saying we can't change shelve and the dbm package to allow 3rd-party code to register with the dbm package such that bsddb can be used as needed behind the scenes.
Exactly. That is what I think should really happen here.
I will try to find a spare time to some job in the area. I am planning API like this (in terms of Python 2.x with anydbm): # dbm.something module import anydbm anydbm.register('something', whichdb_test_function) whichdb_test_function is not required - whichdb module can provide a generic test function: def generic_test(filename, module_name): module = __import__(module_name) try: module.open(filename) except: return False else: return True Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Wed, Sep 03, 2008 at 04:41:32PM -0700, Raymond Hettinger wrote: -> I think this should be deferred to Py3.1. -> -> This decision was not widely discussed and -> I think it likely that some users will -> be surprised and dismayed. The release -> candidate seems to be the wrong time to -> yank this out (in part because of the surprise -> factor) and in part because I think the change -> silently affects shelve performance so that the -> impact may be significantly negative but not -> readily apparent. Related but tangential question that we were discussing on the pygr[0] mailing list -- what is the "official" word on a scalable object store in Python? We've been using bsddb, but is there an alternative? And what if bsddb is removed? It would be very nice to have a moderately scalable (thousands to millions, if not billions) cross-platform object store backend distributed with Python. sqlite could be one choice, but I haven't used it much yet, so I don't know. thanks, --titus [0] Python graph database for bioinformatics, http://code.google.com/p/pygr -- C. Titus Brown, ctb@msu.edu
On Wed, Sep 3, 2008 at 7:56 PM, C. Titus Brown <ctb@msu.edu> wrote:
On Wed, Sep 03, 2008 at 04:41:32PM -0700, Raymond Hettinger wrote: -> I think this should be deferred to Py3.1. -> -> This decision was not widely discussed and -> I think it likely that some users will -> be surprised and dismayed. The release -> candidate seems to be the wrong time to -> yank this out (in part because of the surprise -> factor) and in part because I think the change -> silently affects shelve performance so that the -> impact may be significantly negative but not -> readily apparent.
Related but tangential question that we were discussing on the pygr[0] mailing list -- what is the "official" word on a scalable object store in Python? We've been using bsddb, but is there an alternative? And what if bsddb is removed?
Beyond shelve there are no official plans to add a specific object store. -Brett
>> Related but tangential question that we were discussing on the >> pygr[0] mailing list -- what is the "official" word on a scalable >> object store in Python? We've been using bsddb, but is there an >> alternative? And what if bsddb is removed? Brett> Beyond shelve there are no official plans to add a specific Brett> object store. Unless something has changed while I wasn't looking, shelve requires a concrete module under the covers: bsddb, gdbm, ndbm, dumbdbm. It's just a thin layer over one of them that makes it appear as if you can have keys which aren't strings. Skip
At 6:10 AM -0500 9/4/08, skip@pobox.com wrote:
Related but tangential question that we were discussing on the pygr[0] mailing list -- what is the "official" word on a scalable object store in Python? We've been using bsddb, but is there an alternative? And what if bsddb is removed?
Brett> Beyond shelve there are no official plans to add a specific Brett> object store.
Unless something has changed while I wasn't looking, shelve requires a concrete module under the covers: bsddb, gdbm, ndbm, dumbdbm. It's just a thin layer over one of them that makes it appear as if you can have keys which aren't strings.
I thought that all that was happening was that BSDDB was becoming a separate project. If one needs BSDDB with Python2.6, one installs it. Aren't there other parts of Python that require external modules, such as Tk? Using Tk requires installing it. Such things are normally packaged by each distro the same way as Python is packaged ("yum install tk bsddb"). Shipping an application to end users is a different problem. Such packages should include a private copy of Python as well as of any dependent libraries, as tested. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@georgeanelson.com> ' <http://www.georgeanelson.com/>
On Thu, Sep 04, 2008 at 10:29:10AM -0400, Tony Nelson wrote: -> At 6:10 AM -0500 9/4/08, skip@pobox.com wrote: -> > >> Related but tangential question that we were discussing on the -> > >> pygr[0] mailing list -- what is the "official" word on a scalable -> > >> object store in Python? We've been using bsddb, but is there an -> > >> alternative? And what if bsddb is removed? -> > -> > Brett> Beyond shelve there are no official plans to add a specific -> > Brett> object store. -> > -> >Unless something has changed while I wasn't looking, shelve requires a -> >concrete module under the covers: bsddb, gdbm, ndbm, dumbdbm. It's just a -> >thin layer over one of them that makes it appear as if you can have keys -> >which aren't strings. -> -> I thought that all that was happening was that BSDDB was becoming a -> separate project. If one needs BSDDB with Python2.6, one installs it. -> Aren't there other parts of Python that require external modules, such as -> Tk? Using Tk requires installing it. Such things are normally packaged by -> each distro the same way as Python is packaged ("yum install tk bsddb"). -> -> Shipping an application to end users is a different problem. Such packages -> should include a private copy of Python as well as of any dependent -> libraries, as tested. Why? On Mac OS X, for example, Python comes pre-installed -- not sure if it comes with Tk yet, but the next version probably will. On Windows there's a handy few-click installer that installs Tk. Is there some reason why I shouldn't be relying on those distributions?? Requiring users to install anything at all imposes a barrier to use. That barrier rises steeply in height the more packages (with versioning issues, etc.) are needed. This also increases the tech support burden dramatically. I'm happy to be told that bsddb is too much of a maintenance burden for Python 2.6/3.0 to have -- especially since it's gone from 3.0 now ;) -- but I don't think the arguments that *it won't matter that it's not there* have been very credible. There's a BIG difference between things that come with Python and things that are add-ons. Right now, I'm teaching an intro programming course using Python. It doesn't seem like the students are going to need to install *anything* other than base Python in order to play with full networking libraries & sqlite databases, among other features. And, for me and for them, that's really great. I don't think the convenience of "batteries *included*" should be underestimated. --t -- C. Titus Brown, ctb@msu.edu
At 7:37 AM -0700 9/4/08, C. Titus Brown wrote:
On Thu, Sep 04, 2008 at 10:29:10AM -0400, Tony Nelson wrote: ... -> Shipping an application to end users is a different problem. Such packages -> should include a private copy of Python as well as of any dependent -> libraries, as tested.
Why? On Mac OS X, for example, Python comes pre-installed -- not sure if it comes with Tk yet, but the next version probably will. On Windows there's a handy few-click installer that installs Tk. Is there some reason why I shouldn't be relying on those distributions??
Yes. An application is tested with one version of Python and one version of its libraries. When MOSX updates Python or some other library, you are relying on their testing of your application. Unless you are Adobe or similarly large they didn't do that testing. Perhaps you have noticed the threads about installing a new Python release over the Python that came with an OS, and how bad an idea that is? This is the same issue, from the other side.
Requiring users to install anything at all imposes a barrier to use. That barrier rises steeply in height the more packages (with versioning issues, etc.) are needed. This also increases the tech support burden dramatically. ...
Precisely why one needs to ship a single installer that installs the complete application, including Python and any other libraries it needs. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@georgeanelson.com> ' <http://www.georgeanelson.com/>
On Thu, Sep 04, 2008 at 11:01:35AM -0400, Tony Nelson wrote: -> At 7:37 AM -0700 9/4/08, C. Titus Brown wrote: -> >On Thu, Sep 04, 2008 at 10:29:10AM -0400, Tony Nelson wrote: -> ... -> >-> Shipping an application to end users is a different problem. Such packages -> >-> should include a private copy of Python as well as of any dependent -> >-> libraries, as tested. -> > -> >Why? On Mac OS X, for example, Python comes pre-installed -- not sure -> >if it comes with Tk yet, but the next version probably will. On Windows -> >there's a handy few-click installer that installs Tk. Is there some -> >reason why I shouldn't be relying on those distributions?? -> -> Yes. An application is tested with one version of Python and one version -> of its libraries. When MOSX updates Python or some other library, you are -> relying on their testing of your application. Unless you are Adobe or -> similarly large they didn't do that testing. Perhaps you have noticed the -> threads about installing a new Python release over the Python that came -> with an OS, and how bad an idea that is? This is the same issue, from the -> other side. I have to say I've never had problems with a stock install of Python on either Mac OS X or Windows (shockingly enough :). I think this is good advice for applications that rely on external libraries, but I just don't see any problems with relying on Python 2.5 to contain all the things that normally come with Python 2.5. It seems like you're pushing a pretty sharp dichotomy (trichotomy?) -- - Python library/core developers should compile it all. - Python app developers can rely on what they install from binaries themselves, but not rely on it to be present on anyone else's machine or OS. - End users should be given a complete clean install of Python in a different location for each application they're using, even if those applications depend only on the stdlib. This seems surprisingly complicated to me (and unnecessary, in my limited experience) -- but it does validate my decade-old decision to avoid writing end-user applications in Python, sadly enough. It ends up being less work to distribute and support a C/C++ app on Windows and Mac OS X, for crikey's sake! --t -- C. Titus Brown, ctb@msu.edu
I have to say I've never had problems with a stock install of Python on either Mac OS X or Windows (shockingly enough :). I think this is good
I agree. I just use the stock Python on OS X and Windows. And it seems to work well for my rather large and complicated (PIL, PyLucene, Medusa, ReportLab, SSL, email-4) application. Clearly Windows, with its somewhat complicated PATH and DLL issues, might be problematic, but I haven't seen that yet.
advice for applications that rely on external libraries, but I just don't see any problems with relying on Python 2.5 to contain all the things that normally come with Python 2.5. It seems like you're pushing a pretty sharp dichotomy (trichotomy?) --
Yeah, but this is just some random guy on the Python mailing list (Tony, I apologize for not knowing who you are). No need to take it too seriously.
but it does validate my decade-old decision to avoid writing end-user applications in Python, sadly enough.
Well, I don't do that either, but it's because of Python's lack of a decent built-in GUI toolkit. Sad. Bill
On Sep 4, 2008, at 8:10 AM, C. Titus Brown wrote:
I have to say I've never had problems with a stock install of Python on either Mac OS X or Windows (shockingly enough :). I think this is good advice for applications that rely on external libraries, but I just don't see any problems with relying on Python 2.5 to contain all the things that normally come with Python 2.5.
import bsddb Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/
There can be subtle differences between a "stock" Python and the system Python on Mac OS X 10.5. For example, Mac OS X compiles against EditLine instead of GNU Readline. From "man python" on Mac OS X: """ The Python inteterpreter supports editing of the current input line and history substitution, similar to facilities found in the Korn shell and the GNU Bash shell. However, rather than being implemented using the GNU Readline library, this Python interpreter uses the BSD EditLine library editline(3) with a GNU Readline emulation layer. ... For example, the rlcompleter module, which defines a completion function for the readline modules, works correctly with the EditLine libraries, but needs to be initialized somewhat differently: ... """ Fairly rare that you'd trip over this minor difference though - EditLine is more a problem on Mac OS X when trying to compile your own Python, since you need to install and link against GNU Readline. However, all does not seem to be right with the bsddb module on the system Python 2.5 on Mac OS X 10.5: $ /usr/bin/python Python 2.5.1 (r251:54863, Jan 17 2008, 19:35:17) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. python2.5/bsddb/__init__.py", line 51, in <module> import _bsddb ImportError: No module named _bsddb
-On [20080905 12:34], Kevin Teague (kevin@bud.ca) wrote:
However, all does not seem to be right with the bsddb module on the system Python 2.5 on Mac OS X 10.5:
import bsddb [snip] ImportError: No module named _bsddb
The bsddb module is built separately from Python within FreeBSD's ports. I think Apple did the same for Mac OS X. ports/databases/py-bsddb ports/databases/py-bsddb3 So for a fair number of systems the 'bsddb' module being an external package/dependency is already true. -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Honesty is the first chapter of the book of wisdom...
Kevin Teague wrote:
There can be subtle differences between a "stock" Python and the system Python on Mac OS X 10.5.
Also there can be different versions of Python installed in different versions of MacOSX. So if you distribute an app that relies on the system Python, at the least you have to test it against all the Python versions it's likely to encounter, and possibly provide different versions of the app for different versions of MacOSX. Another thing to keep in mind is that if you use something like py2app or py2exe, it doesn't include the whole Python distribution, just the stdlib modules the app actually uses. So it's not as bad as including a whole Python installation for every app. -- Greg
[C. Titus Brown]
I'm happy to be told that bsddb is too much of a maintenance burden for Python 2.6/3.0 to have -- especially since it's gone from 3.0 now ;) -- but I don't think the arguments that *it won't matter that it's not there* have been very credible.
Not credible, not widely discussed, not tested in a beta ... No alternative provided, no deprecation period before it disappears ... The usual deliberative process has been completely bypassed. Raymond
I don't think the convenience of "batteries *included*" should be underestimated.
Yeah, but bsddb is one of those exploding batteries. I've used it for years, and have had lots and lots of problems with it. Having SQLite in there is great; now we need implementations of anydbm and shelve which use it. Bill
On Thu, Sep 04, 2008 at 09:25:43AM -0700, Bill Janssen wrote:
Yeah, but bsddb is one of those exploding batteries. I've used it for years, and have had lots and lots of problems with it. Having SQLite in there is great; now we need implementations of anydbm and shelve which use it.
What sort of problems? When I've used BerkleyDB in the past, it always took a fair bit of experimenting & searching to figure out the right combination of flags to use (and the BerkeleyDB docs were very low-level), but once that was done it seemed to work OK. Incorporating Jesus's docs will help users with that issue; I'm willing to work on that before 2.6final. I think the primary annoyance is the instability of the bsddb tests, and the resulting bad effect on buildbot's usefulness (as we all just get accustomed to having a patchwork of red randomly mixed in). So why not just strip down the test cases we run to avoid the problematic tests? That won't help Jesus debug on platforms he can't access, but we could re-enable those tests after 2.7 or provide a different buildbot target that runs the entire suite. --amk
I thought that all that was happening was that BSDDB was becoming a separate project. If one needs BSDDB with Python2.6, one installs it.
No, not in the way you mean it.
Aren't there other parts of Python that require external modules, such as Tk?
It's different. BSDDB (the Sleepycat-then-Oracle implementation) never was part of Python; this hasn't changed. What *has* changed is that the bsddb module (i.e. the Python wrapper) is now not part of Python anymore, either, due to it being maintained separately. This is as if Tkinter was removed from Python. Regards, Martin
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Brett Cannon wrote:
Related but tangential question that we were discussing on the pygr[0] mailing list -- what is the "official" word on a scalable object store in Python? We've been using bsddb, but is there an alternative? And what if bsddb is removed?
Beyond shelve there are no official plans to add a specific object store.
If you are storing million of objects, you'd better use a transactional storage, able to survive diskfulls or code/computer crashes. I will maintain "bsddb" as a separate (downloadable via PYPI) package whatever the fate of bsddb in Python stardard library be. So bsddb is a pretty safe bet, even if you need to install it separately. Compared to sqlite, you don't need to know SQL, you can finetuning (for example, using ACI instead of ACID, deciding store by store), and you can do replication and distributed transactions (useful, for example, if your storage is bigger than a single machine capacity, like my case). If you combine Berkeley DB with Durus, for example, all of this is abstracted and you simply use "regular" python objects. If you use bsddb, please consider to subscribe to pybsddb mailing list. It has pretty low traffic and you can guide bsddb evolution (for example, prioritize Berkeley DB binding additions). - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL/hRZlgi5GaxT1NAQIeLgP/XPj32oLFS54QiHjTKrVKf4Bqc/JqFeJl rasN/RM4hiqv3naueB90jPn2eMai3exCQXD85ew7YeMdWluNPEX/crBbhfN7n5M8 qP/GLWCqqDKWhPyvlInghQPoJUyv55TrPLsbUslCNyLAGFb79ETHs8MeaXn7Kx9o +uAc01ifsoA= =Or2m -----END PGP SIGNATURE-----
On Thu, Sep 04, 2008 at 03:23:22PM +0200, Jesus Cea wrote: -> -----BEGIN PGP SIGNED MESSAGE----- -> Hash: SHA1 -> -> Brett Cannon wrote: -> >> Related but tangential question that we were discussing on the pygr[0] -> >> mailing list -- what is the "official" word on a scalable object store -> >> in Python? We've been using bsddb, but is there an alternative? And -> >> what if bsddb is removed? -> > -> > Beyond shelve there are no official plans to add a specific object store. -> -> If you are storing million of objects, you'd better use a transactional -> storage, able to survive diskfulls or code/computer crashes. We're using a write-once-read-many pattern of access, and it is simply a cache of a separate file (that remains around), so no, we don't better use a transactional storage :). -> I will maintain "bsddb" as a separate (downloadable via PYPI) package -> whatever the fate of bsddb in Python stardard library be. So bsddb is a -> pretty safe bet, even if you need to install it separately. Since I/we want to distribute pygr to end-users, this is really not a pleasant prospect. Also often the installation of Python itself goes much more smoothly than the installation of separately compiled binary packages, for all the obvious reasons (compiler/OS versions, lib versions, etc. etc.) -> Compared to sqlite, you don't need to know SQL, you can finetuning (for -> example, using ACI instead of ACID, deciding store by store), and you -> can do replication and distributed transactions (useful, for example, if -> your storage is bigger than a single machine capacity, like my case). If -> you combine Berkeley DB with Durus, for example, all of this is -> abstracted and you simply use "regular" python objects. I agree. I like bsddb for just this reason and I'd like to continue being able to use it! I think that there are many reasons why having such a thing in the stdlib is really useful and I also think it's worth exploring the ramifications of taking it out... --t -- C. Titus Brown, ctb@msu.edu
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 C. Titus Brown wrote:
On Thu, Sep 04, 2008 at 03:23:22PM +0200, Jesus Cea wrote: -> Brett Cannon wrote: -> >> Related but tangential question that we were discussing on the pygr[0] -> >> mailing list -- what is the "official" word on a scalable object store -> >> in Python? We've been using bsddb, but is there an alternative? And -> >> what if bsddb is removed? -> > -> > Beyond shelve there are no official plans to add a specific object store. -> -> If you are storing million of objects, you'd better use a transactional -> storage, able to survive diskfulls or code/computer crashes.
We're using a write-once-read-many pattern of access, and it is simply a cache of a separate file (that remains around), so no, we don't better use a transactional storage :).
If you can recreate the database in case of problems, and it is mostly reads, then I would suggest you gdbm. I personally hate SQL and "SQL fits all" mentality, and the mindset/impedance mismatch between python and objects, and SQL world, but sure sqlite module could fill the bill also... if you don't mind mixing two languages and two logics in your code and your brain :).
-> I will maintain "bsddb" as a separate (downloadable via PYPI) package -> whatever the fate of bsddb in Python stardard library be. So bsddb is a -> pretty safe bet, even if you need to install it separately.
Since I/we want to distribute pygr to end-users, this is really not a pleasant prospect. Also often the installation of Python itself goes much more smoothly than the installation of separately compiled binary packages, for all the obvious reasons (compiler/OS versions, lib versions, etc. etc.)
I agree. I can check the library with Solaris 10 and several flavors of Linux, but I'm particularly worried about Windows support. I'm unable to provide precompiled libs, and 99.999% of windows users don't know what a "compiler thing" is, let alone being able to compile anything by themselves.
I agree. I like bsddb for just this reason and I'd like to continue being able to use it! I think that there are many reasons why having such a thing in the stdlib is really useful and I also think it's worth exploring the ramifications of taking it out...
+Inf - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAUeZlgi5GaxT1NAQKM5gQAhEO8OmVvVXr/jp1Hqj7DnxuPb0kabeGF TjDyiiJutbDKNLZiqegV7nzNpfJBMMZXNXTG70Lbrh05GWdzYcLahHluPzmf3hu6 wPCjv63NijH4OCmhtGmN4vi/C6p5VX1NqktN0evX7FYNJrnoYVKBRSnFdF8aPSbI wUKKSsihJTw= =Zv+S -----END PGP SIGNATURE-----
On Thu, Sep 04, 2008 at 07:01:47PM +0200, Jesus Cea wrote: -> -----BEGIN PGP SIGNED MESSAGE----- -> Hash: SHA1 -> -> C. Titus Brown wrote: -> > Since I/we want to distribute pygr to end-users, this is really not a -> > pleasant prospect. Also often the installation of Python itself goes -> > much more smoothly than the installation of separately compiled binary -> > packages, for all the obvious reasons (compiler/OS versions, lib -> > versions, etc. etc.) -> -> I agree. I can check the library with Solaris 10 and several flavors of -> Linux, but I'm particularly worried about Windows support. I'm unable to -> provide precompiled libs, and 99.999% of windows users don't know what a -> "compiler thing" is, let alone being able to compile anything by themselves. I believe I might be able to help you with this. More off-list, in a few weeks; if anyone else needs full Windoze access, Watch This Space, as they say. (Yes, I know access is not enough -- you really want someone to be paying attention on Windows, too. I'm working on a project or two there; access to large quantities of talented students is opening up some ideas :) --titus -- C. Titus Brown, ctb@msu.edu
> Compared to sqlite, you don't need to know SQL, you can finetuning > (for example, using ACI instead of ACID, deciding store by store), and > you can do replication and distributed transactions (useful, for > example, if your storage is bigger than a single machine capacity, > like my case). If you combine Berkeley DB with Durus, for example, all > of this is abstracted and you simply use "regular" python objects. Titus> I agree. I like bsddb for just this reason and I'd like to Titus> continue being able to use it! I think that there are many Titus> reasons why having such a thing in the stdlib is really useful Titus> and I also think it's worth exploring the ramifications of taking Titus> it out... I suggested in another message (perhaps on another thread) that maybe a dbm.sqlite module would be worth having. It would have a dict-ish API like the other dict-on-disk modules but use the sqlite module to read (SELECT) and write (INSERT and UPDATE) key/value pairs from the underlying database. Skip
Doesn't SQLlite still have a 4gb cap? I'd personally prefer an open source solution (if that's Berkeley so be it but there's plenty out there... MySQL for one)
never mind about the limit... I was thinking SQL Express On Thu, Sep 4, 2008 at 1:07 PM, Jeff Hall <hall.jeff@gmail.com> wrote:
Doesn't SQLlite still have a 4gb cap?
I'd personally prefer an open source solution (if that's Berkeley so be it but there's plenty out there... MySQL for one)
-- Haikus are easy Most make very little sense Refrigerator
On Thu, Sep 04, 2008 at 01:07:23PM -0400, Jeff Hall wrote:
Doesn't SQLlite still have a 4gb cap?
http://sqlite.org/limits.html Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
On Thu, Sep 4, 2008 at 10:03 AM, <skip@pobox.com> wrote:
Compared to sqlite, you don't need to know SQL, you can finetuning (for example, using ACI instead of ACID, deciding store by store), and you can do replication and distributed transactions (useful, for example, if your storage is bigger than a single machine capacity, like my case). If you combine Berkeley DB with Durus, for example, all of this is abstracted and you simply use "regular" python objects.
Titus> I agree. I like bsddb for just this reason and I'd like to Titus> continue being able to use it! I think that there are many Titus> reasons why having such a thing in the stdlib is really useful Titus> and I also think it's worth exploring the ramifications of taking Titus> it out...
I suggested in another message (perhaps on another thread) that maybe a dbm.sqlite module would be worth having. It would have a dict-ish API like the other dict-on-disk modules but use the sqlite module to read (SELECT) and write (INSERT and UPDATE) key/value pairs from the underlying database.
I offered to write one of these a couple months ago as an option for some users who would otherwise think to use bsddb, dbm, or anydbm. Few people saw that anything like that would be useful, detractors stating that the expansive options available in bsddb (most of which I didn't realize existed) made it effectively irreplaceable to the vast majority of people who actually use bsddb for anything nontrivial. There wasn't even feigned interest in those use-cases involving "trivial" disk-persistent dictionaries (of which 100% of my uses have involved over the last 10 years). - Josiah
me> I suggested in another message (perhaps on another thread) that me> maybe a dbm.sqlite module would be worth having. http://bugs.python.org/issue3783 Skip
On Thu, Sep 4, 2008 at 5:02 PM, <skip@pobox.com> wrote:
me> I suggested in another message (perhaps on another thread) that me> maybe a dbm.sqlite module would be worth having.
I did a similar thing today. I can post my version later today. - Josiah
-On [20080904 16:22], C. Titus Brown (ctb@msu.edu) wrote:
I agree. I like bsddb for just this reason and I'd like to continue being able to use it! I think that there are many reasons why having such a thing in the stdlib is really useful and I also think it's worth exploring the ramifications of taking it out...
And having to install bsddb from an external source disables your use, how? -- Jeroen Ruigrok van der Werven <asmodai(-at-)in-nomine.org> / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B Infinite Dreams, I can't deny them, Infinity is hard to comprehend...
On Thu, Sep 04, 2008 at 03:23:22PM +0200, Jesus Cea wrote:
Compared to sqlite, you don't need to know SQL, you can finetuning (for example, using ACI instead of ACID, deciding store by store), and you can do replication and distributed transactions (useful, for example, if your storage is bigger than a single machine capacity, like my case).
Let me raise the glove. Compared to bsddb: -- SQLite is public domain; the licensing terms of Berkeley DB[1] are not friendly to commercial applications: "Our open source license ... permits use of Berkeley DB in open source projects or in applications that are not distributed to third parties." I am not sure if using of PyBSDDB in commercial applications is considered "using of Berkeley DB in open source projects"; -- SQLite has a pretty stable API and a pretty stable on-disk format; for bsddb one needs to do dump/reload on every major release; -- SQLite implements a subset of SQL - a powerful query language; -- SQLite is extensible - one can write his/her own functions and aggregates, e.g.; PySQLite allows to write these functions in Python; PySQLite also allows to write data conversion functions that converts between Python and SQL data types; -- a program can attach a few databases at once thus distributing loads between a number of disks, including network mounts. [1] http://www.oracle.com/technology/software/products/berkeley-db/htdocs/licens...
If you combine Berkeley DB with Durus, for example, all of this is abstracted and you simply use "regular" python objects.
Durus (and ZODB) has an index of all objects, the index is stored in memory AFAIK - a real problem if one has millions of objects. Does bsddb help to mitigate the problem? Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Oleg Broytmann wrote:
-- SQLite is public domain; the licensing terms of Berkeley DB[1] are not friendly to commercial applications: "Our open source license ... permits use of Berkeley DB in open source projects or in applications that are not distributed to third parties." I am not sure if using of PyBSDDB in commercial applications is considered "using of Berkeley DB in open source projects";
I can't comment on this. I'm not a lawyer.
-- SQLite has a pretty stable API and a pretty stable on-disk format; for bsddb one needs to do dump/reload on every major release;
Not at all. The worst thing you would need to do is a "db_upgrade", an in-place operation. Lately it is pretty harmless and fast (like upgrading the log format, not the database file format). A stable fileformat is useful for long term support, but an evolving format allows improvements. Following your reasoning, Python should be keep in 1.0 era, for compatibility sake.
-- SQLite implements a subset of SQL - a powerful query language;
Yes, a declarative language completely unrelated to Python.
-- SQLite is extensible - one can write his/her own functions and aggregates, e.g.; PySQLite allows to write these functions in Python; PySQLite also allows to write data conversion functions that converts between Python and SQL data types;
bsddb 4.7.4 (available next month) will allow to subclass DB/DBEnv, etc. objects, so you can implement the logic you wish there. Until that, you can do proxy/delegation (that is the way I'm doing 3.0 compatibility, BTW).
-- a program can attach a few databases at once thus distributing loads between a number of disks, including network mounts.
That is an OS issue. Any program get the benefice. The problem is not disk capacity. Any modern machine can scale disk without bound via iSCSI, for example (god bless ZFS!). The issue is replication for redundancy, load sharing and high availability. These things are available in bsddb 4.7.3 (that is, in Python 2.6). How do you scale traffic demand in SQLite?. I can keep adding machines to solve read requests, without sharing any disk between them. I can launch 64 bsddb processes in a single 64 CPU machine to manage (read/write) a single shared database. I don't know if SQLite can do that. Berkeley DB can.
Durus (and ZODB) has an index of all objects, the index is stored in memory AFAIK - a real problem if one has millions of objects. Does bsddb help to mitigate the problem?
Latest Durus has not that issue, but you always can use another project of mine: Berkeley DB Backend Storage Engine for DURUS http://www.jcea.es/programacion/durus-berkeleydbstorage.htm This code (with some private enhancements for replication and distributed transactions) manages a nearly 200 Terabytes Durus repository without any sweat (~2^35 objects stored there). In this particular instance, distributed transactions allows me to partition data between several machines, with no sharing between them, and replication allows redundancy and routing read requests to a less loaded machine (the writes goes to the "master" machine, replication from there is transparent). SQLite is a good product. I just dislike "SQL fits all" model, as I already said in another message. Using a SQL storage to save persistent Python objects is ugly and SQL language is of no use there. You just need "something" safe, scalable, configurable and being able to give you opaque objects when an OID (Object ID) is presented. Compare a terabyte Python "shelve" object, ease of use, transparence, etc., with keeping the objects in a SQL database server. Just my opinion, of course. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAdiJlgi5GaxT1NAQJxkwP/emM8dDKbnhxme76Nm3bXhA89NwCgQNQi ojO0wkVZZ8ypUBNKwGM8PyDzGYoWGnh7VgylGb2bsPt67bCxrHjcXBNPaYrMN/fw AETLlJUrhu9J17jPWKA+JU1FmC9oX34Ki580qMXI9nR51LVLU/1H6nM+KgA0slnn uG3xvm5chfk= =M46E -----END PGP SIGNATURE-----
On Thu, Sep 04, 2008 at 07:40:28PM +0200, Jesus Cea wrote:
A stable fileformat is useful for long term support, but an evolving format allows improvements.
Once I upgraded Python on a Windows computer... I think it was 2.2 to 2.3 upgrade - and all my bsddb databases stopped working. I cannot call this "improvement". I didn't have db_upgarde on that computer (or I didn't know about it). Does PyBSDDB have db_upgrade in the distribution? Does Python distribution have db_upgrade?
Following your reasoning, Python should be keep in 1.0 era, for compatibility sake.
Python? No. But persistent data structures? Yes! How many different pickle data formats there were since Python 1.0? What is the oldest pickle format modern Python can read? (Just using pickle as an example.)
-- SQLite implements a subset of SQL - a powerful query language;
Yes, a declarative language completely unrelated to Python.
Sometimes being unrelated to Python is advantage. Written in C, optimized for its tasks, the implementation of the query language certainly can outperform Python.
Using a SQL storage to save persistent Python objects is ugly
No more ugly than any other storage. A matter of taste, I think. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Oleg Broytmann wrote:
Once I upgraded Python on a Windows computer... I think it was 2.2 to 2.3 upgrade - and all my bsddb databases stopped working. I cannot call this "improvement". I didn't have db_upgarde on that computer (or I didn't know about it). Does PyBSDDB have db_upgrade in the distribution? Does Python distribution have db_upgrade?
I can't comment about bsddb status before my maintenance era (since March or so). But current release: http://www.jcea.es/programacion/pybsddb_doc/db.html#upgrade """ [jcea@tesalia Modules]$ grep -i upgrade * _bsddb.c:DB_upgrade(DBObject* self, PyObject* args) _bsddb.c: if (!PyArg_ParseTuple(args,"s|i:upgrade", &filename, &flags)) _bsddb.c: err = self->db->upgrade(self->db, filename, flags); _bsddb.c: MAKE_ENTRY(nupgrade); _bsddb.c: {"upgrade", (PyCFunction)DB_upgrade, METH_VARARGS}, _bsddb.c: ADD_INT(d, DB_UPGRADE); _bsddb.c: ADD_INT(d, DB_LOCK_UPGRADE); _bsddb.c: ADD_INT(d, DB_LOCK_UPGRADE_WRITE); _bsddb.c: ADD_INT(d, DB_LOCK_UPGRADE); """
Following your reasoning, Python should be keep in 1.0 era, for compatibility sake.
Python? No. But persistent data structures? Yes! How many different pickle data formats there were since Python 1.0? What is the oldest pickle format modern Python can read? (Just using pickle as an example.)
Modern bsddb can read relic bsddb data; just do a "db_upgrade".
-- SQLite implements a subset of SQL - a powerful query language; Yes, a declarative language completely unrelated to Python.
Sometimes being unrelated to Python is advantage.
No argument here. My brain just work that way :). - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAwLJlgi5GaxT1NAQIKzgP5AeHeGF52DtGs/KMssduQczPnoH5ndgME /265foN/qp/GM4kgunoOTPGd9kREVmxgduVaY9yNvVkQNH0WW+t+y41CIcwL36lG EWXb+9eeAkBm7C0fFLwYZnqDva9/n9Ax7SkXHl+SOerL9Eq6rXzFXyHcTfyZtu8i uI4q4n7nHQA= =upd8 -----END PGP SIGNATURE-----
On Thu, Sep 4, 2008 at 11:03 AM, Oleg Broytmann <phd@phd.pp.ru> wrote:
On Thu, Sep 04, 2008 at 07:40:28PM +0200, Jesus Cea wrote:
A stable fileformat is useful for long term support, but an evolving format allows improvements.
Once I upgraded Python on a Windows computer... I think it was 2.2 to 2.3 upgrade - and all my bsddb databases stopped working. I cannot call this "improvement". I didn't have db_upgarde on that computer (or I didn't know about it). Does PyBSDDB have db_upgrade in the distribution? Does Python distribution have db_upgrade?
Unfortunately that is a bad example and should be blamed on python and not berkeleydb: Going from python 2.2 -> 2.3 was when the bsddb module was renamed to bsddb185 and the bsddb3 module was included as bsddb. It was an incompatible change because the underlying library was entirely replaced with something much much different despite happening to share the name and being made to support the same API. You could probably have built the bsddb185 module and loaded your data from that and rewritten it using the new bsddb module. The db upgrade API is available in the bsddb.db module. I never got around to writing the automatic on disk format upgrade code for the bsddb/__init__.py API. It wouldn't have solved the 2.2->2.3 problem but it would remove the need to know about db_upgrade in all future versions. fwiw, a precompiled berkeleydb with db_upgrade.exe and friends has always been available from sleepycat/oracle so a solution exists even if it wasn't nicely documented in a "reading your data when upgrading python HOWTO."
Following your reasoning, Python should be keep in 1.0 era, for compatibility sake.
Python? No. But persistent data structures? Yes! How many different pickle data formats there were since Python 1.0? What is the oldest pickle format modern Python can read? (Just using pickle as an example.)
Python by itself won't solve your long term data warehousing needs. The SQLite project's on disk format has already changed at least once (sqlite2 -> sqlite3) and no doubt it could change again in the future. The lesson for python: when that happens lets write the code to make the transition between formats trivial. -gps
On Sun, Sep 07, 2008 at 11:34:37AM -0700, Gregory P. Smith wrote:
You could probably have built the bsddb185 module and loaded your data from that and rewritten it using the new bsddb module.
I built bsddb185, loaded old data, exported it to... I don't remember now, but I clearly remember I stopped using bsddb.
The lesson for python: when that happens lets write the code to make the transition between formats trivial.
For me the lesson is different - do not include modules in the stdlib that relies on unstable 3rd party libraries. I consider bsddb unstable. sqlite is more stable, but PySQLite... there are many minor releases between Python releases; my humble opinion is it'd be better to have one external PySQLite module than two (PySQLite and sqlite3). Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Oleg Broytmann wrote:
On Sun, Sep 07, 2008 at 11:34:37AM -0700, Gregory P. Smith wrote:
You could probably have built the bsddb185 module and loaded your data from that and rewritten it using the new bsddb module.
I built bsddb185, loaded old data, exported it to... I don't remember now, but I clearly remember I stopped using bsddb.
The lesson for python: when that happens lets write the code to make the transition between formats trivial.
For me the lesson is different - do not include modules in the stdlib that relies on unstable 3rd party libraries. I consider bsddb unstable. sqlite is more stable, but PySQLite... there are many minor releases between Python releases; my humble opinion is it'd be better to have one external PySQLite module than two (PySQLite and sqlite3).
Unfortunately this advice should have been taken several years ago. The fact is that there are almost certainly Python users who rely on the presence of the bsddb module for production work, and simply removing it without deprecation is bound to upset those users. I'm particularly concerned that it appears that normal procedures have been circumvented to enable its removal from 3.0. Since we have at least one developer committed to ongoing support that seems both harsh and unnecessary. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
On Sun, Sep 7, 2008 at 3:43 PM, Steve Holden <steve@holdenweb.com> wrote:
Oleg Broytmann wrote:
On Sun, Sep 07, 2008 at 11:34:37AM -0700, Gregory P. Smith wrote:
You could probably have built the bsddb185 module and loaded your data from that and rewritten it using the new bsddb module.
I built bsddb185, loaded old data, exported it to... I don't remember now, but I clearly remember I stopped using bsddb.
The lesson for python: when that happens lets write the code to make the transition between formats trivial.
For me the lesson is different - do not include modules in the stdlib that relies on unstable 3rd party libraries. I consider bsddb unstable. sqlite is more stable, but PySQLite... there are many minor releases between Python releases; my humble opinion is it'd be better to have one external PySQLite module than two (PySQLite and sqlite3).
Unfortunately this advice should have been taken several years ago. The fact is that there are almost certainly Python users who rely on the presence of the bsddb module for production work, and simply removing it without deprecation is bound to upset those users.
Those users would first have to port their code to Python 3.0. That task is a lot larger than dealing with a separate download of bsddb. It is not being removed from 2.6.
I'm particularly concerned that it appears that normal procedures have been circumvented to enable its removal from 3.0. Since we have at least one developer committed to ongoing support that seems both harsh and unnecessary.
3.0 breaks a lot of things. Most of the library reorg may have been discussed more than this particular removal, but that doesn't mean that changes won't come as a surprise for most users. In this case, a completely compatible module is available as a 3rd party download. That's a lot less sever than the complete abandonment than the fat of many other modules. It's just a matter of source code packaging. Vendors can completely remove the difference in their packaging of the binaries. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
>> Unfortunately this advice should have been taken several years >> ago. The fact is that there are almost certainly Python users who >> rely on the presence of the bsddb module for production work, and >> simply removing it without deprecation is bound to upset those users. Guido> Those users would first have to port their code to Python Guido> 3.0. That task is a lot larger than dealing with a separate Guido> download of bsddb. It is not being removed from 2.6. Perhaps 2.7 and 3.1 should have a conversion function at the dbm package level which will convert a database from one format to another, e.g.: dbm.convert(srcdb, dstdb, dstfmt) The format of srcdb should be discoverable from the file itself. Skip
On Thu, Sep 4, 2008 at 7:33 AM, Oleg Broytmann <phd@phd.pp.ru> wrote:
SQLite is public domain; the licensing terms of Berkeley DB[1] are not friendly to commercial applications: "Our open source license ... permits use of Berkeley DB in open source projects or in applications that are not distributed to third parties." I am not sure if using of PyBSDDB in commercial applications is considered "using of Berkeley DB in open source projects";
Wow, I hadn't realized that it was such a restrictive license. When I see "Berkeley" I think "BSD license". -- Curt Hagenlocher curt@hagenlocher.org
Curt Hagenlocher wrote:
On Thu, Sep 4, 2008 at 7:33 AM, Oleg Broytmann <phd@phd.pp.ru> wrote:
SQLite is public domain; the licensing terms of Berkeley DB[1] are not friendly to commercial applications: "Our open source license ... permits use of Berkeley DB in open source projects or in applications that are not distributed to third parties." I am not sure if using of PyBSDDB in commercial applications is considered "using of Berkeley DB in open source projects";
Wow, I hadn't realized that it was such a restrictive license. When I see "Berkeley" I think "BSD license".
Well of course nowadays when you see "SleepyCat" you need to be thinking "Oracle". They have dabbled in open source, but I didn't get the impression that the support for Linux was particularly wholehearted, for example. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/
On Thu, Sep 4, 2008 at 7:33 AM, Oleg Broytmann <phd@phd.pp.ru> wrote:
On Thu, Sep 04, 2008 at 03:23:22PM +0200, Jesus Cea wrote:
Compared to sqlite, you don't need to know SQL, you can finetuning (for example, using ACI instead of ACID, deciding store by store), and you can do replication and distributed transactions (useful, for example, if your storage is bigger than a single machine capacity, like my case).
Let me raise the glove. Compared to bsddb:
-- SQLite is public domain; the licensing terms of Berkeley DB[1] are not friendly to commercial applications: "Our open source license ... permits use of Berkeley DB in open source projects or in applications that are not distributed to third parties." I am not sure if using of PyBSDDB in commercial applications is considered "using of Berkeley DB in open source projects";
FWIW, many years ago in the past when I asked sleepycat about this (long before oracle bought them) they said that python was considered to be the application. Using berkeleydb via python for a commercial application did not require a berkeleydb license. But my legal advice is worth as much as the paper its printed on. Always ask your own lawyer and oracle about such things. -gps
On Sep 7, 2008, at 12:04 PM, Gregory P. Smith wrote:
FWIW, many years ago in the past when I asked sleepycat about this (long before oracle bought them) they said that python was considered to be the application. Using berkeleydb via python for a commercial application did not require a berkeleydb license.
They also posted a FAQ on their web site which included that statement, including specifically declaring that using BerkeleyDB via Python for a commercial product did not require a commercial licence. Oh, look, it is still there: http://www.oracle.com/technology/software/products/berkeley-db/htdocs/ licensing.html """ Q. Do I have to pay for a Berkeley DB license to use it in my Perl or Python scripts? A. No, you may use the Berkeley DB open source license at no cost. The Berkeley DB open source license requires that software that uses Berkeley DB be freely redistributable. In the case of Perl or Python, that software is Perl or Python, and not your scripts. Any scripts you write are your property, including scripts that make use of Berkeley DB. None of the Perl, Python or Berkeley DB licenses place any restrictions on what you may do with them. """ Regards, Zooko --- http://allmydata.org -- Tahoe, the Least-Authority Filesystem http://allmydata.com -- back up all your files for $5/month
On Thu, Sep 4, 2008 at 1:41 AM, Raymond Hettinger <python@rcn.com> wrote:
The release candidate seems to be the wrong time to yank this out (in part because of the surprise factor) and in part because I think the change silently affects shelve performance so that the impact may be significantly negative but not readily apparent.
I do not use bsddb directly, but I use shelve which on Linux usually takes advantage of bsddb. Does removing bsddb mean that I will not be able to read shelve files written with Python 2.5 with Python 3.0? That would be quite disturbing to me.
Michele> I do not use bsddb directly, but I use shelve which on Linux Michele> usually takes advantage of bsddb. Does removing bsddb mean that Michele> I will not be able to read shelve files written with Python 2.5 Michele> with Python 3.0? That would be quite disturbing to me. Correctamundo. Skip
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Michele Simionato wrote:
I do not use bsddb directly, but I use shelve which on Linux usually takes advantage of bsddb. Does removing bsddb mean that I will not be able to read shelve files written with Python 2.5 with Python 3.0? That would be quite disturbing to me.
Seems so. If bsddb is actually unavailable in Python 3.0, you would need to download/install it from PYPI. I'm committed to keep bsddb alive, in a way or another :). I'm thinking that if bsddb is discarded in Python 3.0, shelve probably drop it also, so installing bsddb externally will not "magically" make it available to 3.0 shelve. I can't comment about the python-dev plans here. PS: Remember that if you are installing bsddb as a separate package, its name will be "bsddb3", not "bsddb". - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSL/jJZlgi5GaxT1NAQIzJAP8CiIxpoz553NDr+/1GkAfzs3W6fu1uBuo XsCmbqkiOMe9fPOBNZnlfoBnGz4C4nlOlOzQV3RexRXBiWKOqUUg4DlJOJtrAMUO ZNtcz9JyvjzmVNZMezgCsfjkEhzNABbCe9mXHQVCR5zlNZVUKpTW7A06/1eX/gMv ECJqQta662o= =6XBZ -----END PGP SIGNATURE-----
Raymond Hettinger wrote:
I think this should be deferred to Py3.1. This decision was not widely discussed and I think it likely that some users will be surprised and dismayed. The release candidate seems to be the wrong time to yank this out (in part because of the surprise factor) and in part because I think the change silently affects shelve performance so that the impact may be significantly negative but not readily apparent.
I don't use Python for database work myself, but something I am somewhat disappointed to lose is the presence of a moderately complicated package within the Python distribution itself which is making use of the various 2-to-3 migration tools to support both Python 2.x and 3.x with a single mixed Python-and-C code base (along with automatic conversion via 2to3). While that will still be visible to some degree due to the presence of the 2.x version of the bsddb code in Python 2.6, I don't think it will be quite the same as it would have been with the 3.x version also being readily available as part of the standard 3.0 install. Regardless, given that the removal of bsddb from the 3.0 branch is now a done deal in svn, I suggest that all we can do is monitor how much feedback we get indicating that the need to download bsddb as a separate install is a significant hindrance to Py3k migrations. If the feedback indicates it is necessary, restoring the module for 3.1 certainly isn't an impossibility, but in the meantime there *are* real benefits to decoupling the maintenance cycles (those wanting to get the most out of Jesus's ongoing work in exposing more of the bsddb API are probably going to want to be using the external pybsddb releases anyway rather than waiting however many months it ends up being until we get to 2.7/3.1). There's also a bit of a second shot at this for bsddb supporters, in that some of the "omnibus" Python distribution providers like ActiveState and Enthought may choose to include pybsddb once they start releasing bundles for Python 3.0. As far as the corporate scenarios go: if a corporate environment is *so* screwed up as to simultaneously permit a migration from Python 2.x to 3.0 of an internal project that relies on bsddb, while at the same time banning those performing the migration from installing the 3.0 compatible version of pybsddb, then they have much bigger problems than which modules and packages we are choosing to include in the standard library. In my experience, restrictive corporate environments are far more likely to just disallow migrations to 3.0 altogether (and in many cases, the decision makers disallowing such migrations probably won't be wrong - the case for migrating an existing internal app to 3.0 is far, far weaker than that for using 3.0 for new development). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nick Coghlan wrote:
While that will still be visible to some degree due to the presence of the 2.x version of the bsddb code in Python 2.6, I don't think it will be quite the same as it would have been with the 3.x version also being readily available as part of the standard 3.0 install.
Since 2.6 intention seems to mark this module as deprecated, I guess 2.x bsddb presence in stock python will finish in 2.7. Moreover, I'm working just now improving 2.x/3.x conversion code in pybsddb. I think this code will be available in bsddb 4.7.4, and it will not be integrated in Python 2.6 (that will include 4.7.3.minor releases, if we keep the criteria of "only stability and security fixes in 2.6.x"). If the idea is to keep bsddb alive in 2.x, I don't see the point of not keeping the 3.0 version, because the issues used to justify the removal persist: I'm the only maintainer, little code review, buildbot issues, etc. (I would like a comprehensive list, to be able to improve those deficiencies). In fact, if we keep bsddb in 2.x, the pressure to keep it in 3.x will be higher.
Regardless, given that the removal of bsddb from the 3.0 branch is now a done deal in svn, I suggest that all we can do is monitor how much
Any version control system can revert that with a single command :). All I can say is that current bsddb code (in my personal repository) passes all tests in current compiled python3.0 binary, called with the "-bb" parameter flag (the "-bb" flag was something I learned yesterday).
but in the meantime there *are* real benefits to decoupling the maintenance cycles (those wanting to get the most out of Jesus's ongoing work in exposing more of the bsddb API are probably going to want to be using the external pybsddb releases anyway rather than waiting however many months it ends up being until we get to 2.7/3.1).
The cycles are actually decoupled since I toke over the bsddb maintenance (I've released a new version per month). So the release cycles are not an issue. The main issue here is 3.0 support, that I worked over the last couple of months. It is done now. It couldn't be done faster because I was learning 3.0 internals on-the-fly (there are NO docs about C module migration; my experience there could be valuable) and 3.0 was a moving target (still is). For example, when I left to summer holiday bsddb worked flawless in Python 3.0b2. It failed in 3.0b3 because threading renames done in python 3.0. So, Python 3.0 is not waiting for bsddb to be ready, because it already is (since yesterday). And future Python releases won't suffer because we won't have any other major architectural reengineering of Python in a long long time (I hope!). That is, future Python releases would take whatever bsddb is available at that time. No wait. No dependent release cycles. With my current release schema of "a release per month", I can track python evolution with little effort. For example, Python 2.5 to 2.6 was pretty painless, even with the "PyBytes_*" ugliness.
As far as the corporate scenarios go: if a corporate environment is *so* screwed up as to simultaneously permit a migration from Python 2.x to 3.0 of an internal project that relies on bsddb, while at the same time banning those performing the migration from installing the 3.0 compatible version of pybsddb, then they have much bigger problems than which modules and packages we are choosing to include in the standard library.
Agreed. I was thinking about bsddb removal in 2.7. - -- Jesus Cea Avion _/_/ _/_/_/ _/_/_/ jcea@jcea.es - http://www.jcea.es/ _/_/ _/_/ _/_/ _/_/ _/_/ jabber / xmpp:jcea@jabber.org _/_/ _/_/ _/_/_/_/_/ . _/_/ _/_/ _/_/ _/_/ _/_/ "Things are not so easy" _/_/ _/_/ _/_/ _/_/ _/_/ _/_/ "My name is Dump, Core Dump" _/_/_/ _/_/_/ _/_/ _/_/ "El amor es poner tu felicidad en la felicidad de otro" - Leibniz -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.8 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iQCVAwUBSMAEnplgi5GaxT1NAQIrKgP/YAp45HUSG8Q+M355LTVqlcLMLkycpooc fflW0MlQ3zZV307VBUbGo9urkS6h1pYhYByivApylhVqj8D4x8OEmMZk0lX8cegG LYSBzs/sBeyxWWva6r5D9/4DsgJe9ZHqaBLMpy6ipPNVtUbMS61VTNovb3wP+f72 EnSIf9k/glM= =QxRo -----END PGP SIGNATURE-----
participants (22)
-
"Martin v. Löwis"
-
A.M. Kuchling
-
Bill Janssen
-
Brett Cannon
-
C. Titus Brown
-
Curt Hagenlocher
-
Greg Ewing
-
Gregory P. Smith
-
Guido van Rossum
-
Jeff Hall
-
Jeroen Ruigrok van der Werven
-
Jesus Cea
-
Josiah Carlson
-
Kevin Teague
-
Michele Simionato
-
Nick Coghlan
-
Oleg Broytmann
-
Raymond Hettinger
-
skip@pobox.com
-
Steve Holden
-
Tony Nelson
-
zooko