Integrate BeautifulSoup into stdlib?
I haven't seen a lot of discussion on this - maybe I didn't search hard enough - but what are people's thoughts on including BeautifulSoup in stdlib? It's small, fast, and pretty widely-liked by the people who know about it. Someone mentioned that web scraping needs are infrequent. My argument is that people ask questions about them less because they feel they can just reinvent the wheel really easily using urllib and regexes. It seems like this is similar to the CSV problem from a while back actually, with everyone implementing their own parsers. We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup. In a not-entirely-unrelated vein, has there been any discussion on just throwing all of Mechanize into stdlib? BeautifulSoup: http://www.crummy.com/software/BeautifulSoup/ mechanize: http://wwwsearch.sourceforge.net/mechanize/ Regards, Vaibhav Mallya
On Mon, Mar 2, 2009 at 04:23, Vaibhav Mallya <vaibhavmallya@gmail.com>wrote:
I haven't seen a lot of discussion on this - maybe I didn't search hard enough - but what are people's thoughts on including BeautifulSoup in stdlib? It's small, fast, and pretty widely-liked by the people who know about it. Someone mentioned that web scraping needs are infrequent. My argument is that people ask questions about them less because they feel they can just reinvent the wheel really easily using urllib and regexes. It seems like this is similar to the CSV problem from a while back actually, with everyone implementing their own parsers.
We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup.
In a not-entirely-unrelated vein, has there been any discussion on just throwing all of Mechanize into stdlib?
Discussions of including modules in the standard library only occurs when the module creators step forward to offer to support the modules. To my knowledge neither the creators of BeautifulSoup or Mechanize have come forward to offer to manage the code in Python's standard library. -Brett
Vaibhav Mallya wrote:
We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup.
Interesting, given that BeautifulSoup is built on HTMLParser ;-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
At 2:56 PM +0000 3/4/09, Chris Withers wrote:
Vaibhav Mallya wrote:
We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup.
Interesting, given that BeautifulSoup is built on HTMLParser ;-)
In BeautifulSoup >= 3.1, yes. Before that (<= 3.07a), it was based on the more robust sgmllib.SGMLParser. The current BeautifulSoup can't handle '<foo a="bc"b="cd">', while the earlier SGMLParser versions can. I don't know quite how common that missing space is in the wild, but I've personally made HTML with that problem. Maybe this is the only problem with using HTMLParser instead of SGMLParser; I don't know. In the mean time, if I have a need for BeautifulSoup in Python3.x, I'll port sgmllib and use the older BeautifulSoup. -- ____________________________________________________________________ TonyN.:' <mailto:tonynelson@georgeanelson.com> ' <http://www.georgeanelson.com/>
On Mar 4, 2009, at 9:56 AM, Chris Withers wrote:
Vaibhav Mallya wrote:
We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup.
Interesting, given that BeautifulSoup is built on HTMLParser ;-)
I think html5lib would be a better candidate for an imrpoved HTML parser in the stdlib than BeautifulSoup. James
James Y Knight schrieb:
On Mar 4, 2009, at 9:56 AM, Chris Withers wrote:
Vaibhav Mallya wrote:
We do have HTMLParser, but that doesn't handle malformed pages well, and just isn't as nice as BeautifulSoup.
Interesting, given that BeautifulSoup is built on HTMLParser ;-)
I think html5lib would be a better candidate for an imrpoved HTML parser in the stdlib than BeautifulSoup.
I second that. Georg
On Mar 4, 2009, at 12:32 PM, James Y Knight wrote:
I think html5lib would be a better candidate for an imrpoved HTML parser in the stdlib than BeautifulSoup.
While we're talking about alternatives, Ian Bicking appears to swear by lxml: <http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-...
Cheers, -- Ivan Krstić <krstic@solarsail.hcs.harvard.edu> | http://radian.org
Ivan Krstić wrote:
On Mar 4, 2009, at 12:32 PM, James Y Knight wrote:
I think html5lib would be a better candidate for an imrpoved HTML parser in the stdlib than BeautifulSoup.
While we're talking about alternatives, Ian Bicking appears to swear by lxml:
<http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/>
I second that. ;) And, BTW, I wouldn't mind getting lxml into the stdlib either. Stefan
On Thu, Mar 5, 2009 at 2:39 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
Ivan Krstić wrote:
On Mar 4, 2009, at 12:32 PM, James Y Knight wrote:
I think html5lib would be a better candidate for an imrpoved HTML parser in the stdlib than BeautifulSoup.
While we're talking about alternatives, Ian Bicking appears to swear by lxml:
<http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/>
I second that. ;)
And, BTW, I wouldn't mind getting lxml into the stdlib either.
No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt. Based on the sad example of BerkeleyDB, which was initially welcomed into the stdlib but more recently booted out for reasons having to do with the release cycle of the external dependency and other issues typical for large external dependencies, I think we should be very careful with including it in the standard library. Instead, let's hope Linux distros pick it up (and if anyone knows how to encourage that, let us know). -- --Guido van Rossum (home page: http://www.python.org/~guido/)
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Mar 5, 2009, at 12:32 PM, Guido van Rossum wrote:
Instead, let's hope Linux distros pick it up (and if anyone knows how to encourage that, let us know).
Gentoo: emerge lxml Ubuntu (and probably Debian): apt-get install python-lxml Guido, do you know where your time machine keys are? :) Barry -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Darwin) iQCVAwUBSbAaRHEjvBPtnXfVAQK3nQP/Uz6CF7zxIbTJWHyGyPBr1+pUTESzryNs SKnBwcyIjYw/+7whtdfp31jbgsv+FcZ9YmMx7haUzPS/lKaRClvfUlirXepDCQt/ Z44nxvjEbbpQPmvlmf9SAIgvk7AumWcigXth2cvMJedHz0fVA9jXA1f/bnGxdTA6 /DUrqxruwZo= =R5FW -----END PGP SIGNATURE-----
On Thu, Mar 05, 2009 at 01:30:25PM -0500, Barry Warsaw wrote:
Ubuntu (and probably Debian): apt-get install python-lxml
Tested in Debian: yes, the incantation works. Oleg. -- Oleg Broytmann http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.
Hi Guido, I'm happy to see you jump onto this. Guido van Rossum wrote:
No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt.
I actually had a recent discussion with other lxml developers and we were fast to agree that that would be the main problem. lxml itself is only some 18K lines of Cython code (which translates into 180K lines of C code) and less than 7K lines of Python code, but libxml2 and libxslt add up to about 230K lines of C code just by themselves. That is definitely far from trivial and it's hard to guarantee that bugs in these libraries will never lead to security holes in a Python release, for example. Still, it does provide an awful lot of things that the stdlib currently fails to deliver in one way or another, some even completely. XPath, XSLT, XML validation and (above all) real-world HTML parsing come to mind. I definitely stopped counting the posts on c.l.py about HTMLParser not being able to parse a specific web page. It's good that (c)ElementTree is part of the stdlib, and it's also good that there is a rather smooth upgrade path towards lxml. But lxml is by itself becoming more and more a critical dependency of web related packages and applications, simply because it provides everything in one tool. And even if I wasn't the author of lxml, I would have a hard time feeling happy if a real-world HTML parser was added to the stdlib that provides a totally different interface than the best (and fastest) XML library that the stdlib currently has.
Instead, let's hope Linux distros pick it up (and if anyone knows how to encourage that, let us know).
At least all Debian based distros (such as Ubuntu) have it available. Not the latest, greatest version, but that will come. That said, it's never been a real problem to EasyInstall lxml directly from PyPI onto any decent Linux distribution. MacOS-X is a far more tricky target here, not to say a word about Windows (C-compiler? anyone?). I would expect that even if lxml itself was in the stdlib, Linux distributions would (want to) build it against their system libraries. Static builds would only be required on MacOS-X and Windows. Stefan
On Thu, Mar 5, 2009 at 11:22 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
I'm happy to see you jump onto this.
I'm not sure why you say that -- all I am doing is advising *against* inclusion.
Guido van Rossum wrote:
No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt.
I actually had a recent discussion with other lxml developers and we were fast to agree that that would be the main problem. lxml itself is only some 18K lines of Cython code (which translates into 180K lines of C code) and less than 7K lines of Python code, but libxml2 and libxslt add up to about 230K lines of C code just by themselves. That is definitely far from trivial and it's hard to guarantee that bugs in these libraries will never lead to security holes in a Python release, for example.
Still, it does provide an awful lot of things that the stdlib currently fails to deliver in one way or another, some even completely. XPath, XSLT, XML validation and (above all) real-world HTML parsing come to mind. I definitely stopped counting the posts on c.l.py about HTMLParser not being able to parse a specific web page.
There's *waaaay* too much stuff in the XML world to ever hope to have comprehensive support in the stdlib. Heck, XmlPlus hasn't even been incorporated into the stdlib.
It's good that (c)ElementTree is part of the stdlib, and it's also good that there is a rather smooth upgrade path towards lxml.
And yet it worries me that lxml claims to be "mostly compatible" with ElementTree. What's keeping it from being completely (backwards) compatible?
But lxml is by itself becoming more and more a critical dependency of web related packages and applications, simply because it provides everything in one tool.
That depends on how XML-centric your thinking is. Personally I *don't* like putting everything in XML, and so far I have been able to keep my code 99% XML-free.
And even if I wasn't the author of lxml, I would have a hard time feeling happy if a real-world HTML parser was added to the stdlib that provides a totally different interface than the best (and fastest) XML library that the stdlib currently has.
That sounds like a completely different argument and one you should have with the proponents of inclusion of that other parser. I can only assume you're talking about html5lib or BeautifulSoup. I have no knowledge of any of these, and prefer to stay out of that discussion.
Instead, let's hope Linux distros pick it up (and if anyone knows how to encourage that, let us know).
At least all Debian based distros (such as Ubuntu) have it available. Not the latest, greatest version, but that will come. That said, it's never been a real problem to EasyInstall lxml directly from PyPI onto any decent Linux distribution. MacOS-X is a far more tricky target here, not to say a word about Windows (C-compiler? anyone?).
I would expect that even if lxml itself was in the stdlib, Linux distributions would (want to) build it against their system libraries. Static builds would only be required on MacOS-X and Windows.
And that in itself is one of the main arguments against inclusion in the stdlib, since it adds a whole new level of complexity to the compatibility matrix. E.g. assume that some newer version of libxml2 has a new feature. You can wrap that feature with an API in lxml, but now you require that newer libxml2 version as a dependency. Since the distros don't support that they either are prevented from providing the corresponding newer version of Python or you will have to make the lxml code conditional on the presence or absence of that API. The latter is preferable, but now it means that Python users can't rely on that API being present even if they have the right version of Python. It's a mess. Requiring a 3rd party download makes this cleaner, because you decouple the llibxml2/lxml versioning from the Python version. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
On Thu, Mar 5, 2009 at 11:22 AM, Stefan Behnel wrote:
I'm happy to see you jump onto this.
I'm not sure why you say that -- all I am doing is advising *against* inclusion.
I understand that. What worth is a discussion where everyone just nods for good? :)
Guido van Rossum wrote: There's *waaaay* too much stuff in the XML world to ever hope to have comprehensive support in the stdlib.
Definitely. But lxml was born because some Dutch guy thought that there was way too little easy-to-master XML support in the overall Python world. http://codespeak.net/lxml/intro.html There is some space to look for a trade-off here.
It's good that (c)ElementTree is part of the stdlib, and it's also good that there is a rather smooth upgrade path towards lxml.
And yet it worries me that lxml claims to be "mostly compatible" with ElementTree. What's keeping it from being completely (backwards) compatible?
The underlying tree model. An Element in lxml.etree knows it's parent, which isn't the case in ET. That's the main difference. Most people call that a feature in lxml, but it's fundamental and it does have the implication that you can't keep the same Element object in more than one place. Some other (minor) differences are described here: http://codespeak.net/lxml/dev/compatibility.html
But lxml is by itself becoming more and more a critical dependency of web related packages and applications, simply because it provides everything in one tool.
That depends on how XML-centric your thinking is. Personally I *don't* like putting everything in XML, and so far I have been able to keep my code 99% XML-free.
That's totally fine. I used Python for years without ever feeling the need to deploy any of the dbm databases in my projects. Nor curses, nor tk. And lxml.objectify only supports pickle because one of the developers thought it was a good idea to pickle trees. And yet all of these modules are part of the stdlib, and I bet there are a whole lot of applications by now that wouldn't work without them.
I would expect that even if lxml itself was in the stdlib, Linux distributions would (want to) build it against their system libraries. Static builds would only be required on MacOS-X and Windows.
And that in itself is one of the main arguments against inclusion in the stdlib, since it adds a whole new level of complexity to the compatibility matrix. E.g. assume that some newer version of libxml2 has a new feature.
That happens. So far, I have managed to keep lxml backwards compatible over more than three years of libxml2 releases. However:
You can wrap that feature with an API in lxml, but now you require that newer libxml2 version as a dependency. Since the distros don't support that they either are prevented from providing the corresponding newer version of Python or you will have to make the lxml code conditional on the presence or absence of that API. The latter is preferable, but now it means that Python users can't rely on that API being present even if they have the right version of Python. It's a mess. Requiring a 3rd party download makes this cleaner, because you decouple the llibxml2/lxml versioning from the Python version.
A good example is actually (once again) parsing broken HTML. libxml2 handles this a lot better since 2.6.21, so if you use 2.6.20, you will simply not get the same results as with a later version. I do see the point you are making here. Even if lxml gets mature and static, that doesn't necessarily apply to the external libraries it uses. However, I should note that exactly the same argument also applies to sqlite3 and gdbm, which, again, are in the stdlib today, with sqlite3 being a fairly recent addition. Stefan
Stefan, I recommend that you give up pushing for lxml in the stdlib. There are many complex factors to be weighed but in the balance I am not comfortable with it, and continued argumentation is not going to change that. Sorry, -- --Guido van Rossum (home page: http://www.python.org/~guido/)
I do see the point you are making here. Even if lxml gets mature and static, that doesn't necessarily apply to the external libraries it uses. However, I should note that exactly the same argument also applies to sqlite3 and gdbm, which, again, are in the stdlib today, with sqlite3 being a fairly recent addition.
Fortunately, it is possible for users to just replace the sqlite DLL in a Python installation, with no need of recompiling anything. Regards, Martin
Martin v. Löwis wrote:
I do see the point you are making here. Even if lxml gets mature and static, that doesn't necessarily apply to the external libraries it uses. However, I should note that exactly the same argument also applies to sqlite3 and gdbm, which, again, are in the stdlib today, with sqlite3 being a fairly recent addition.
Fortunately, it is possible for users to just replace the sqlite DLL in a Python installation, with no need of recompiling anything.
Interesting. I assume you are referring to Windows here, right? Does that "just work" because the DLL is in the same directory? That would be a nice feature for lxml, too. We could just make the libxml2 and libxslt DLLs package data under Windows in that case. Stefan
Stefan Behnel wrote:
Martin v. Löwis wrote:
I do see the point you are making here. Even if lxml gets mature and static, that doesn't necessarily apply to the external libraries it uses. However, I should note that exactly the same argument also applies to sqlite3 and gdbm, which, again, are in the stdlib today, with sqlite3 being a fairly recent addition. Fortunately, it is possible for users to just replace the sqlite DLL in a Python installation, with no need of recompiling anything.
Interesting. I assume you are referring to Windows here, right? Does that "just work" because the DLL is in the same directory?
I have no idea, but my WinXP .../Python30/ install has DLLs/_sqlite3.pyd 52K DLLs/sqlite3.dll 557K libs/_sqlite3.lib 2K For whatever reason, most other things do not have all three files. I do not know whether upgrades (like 3.0.0 to 3.0.1) would clobber other things added here.
That would be a nice feature for lxml, too. We could just make the libxml2 and libxslt DLLs package data under Windows in that case.
On Fri, Mar 6, 2009 at 9:54 AM, Terry Reedy <tjreedy@udel.edu> wrote:
Stefan Behnel wrote:
Martin v. Löwis wrote:
I do see the point you are making here. Even if lxml gets mature and static, that doesn't necessarily apply to the external libraries it uses. However, I should note that exactly the same argument also applies to sqlite3 and gdbm, which, again, are in the stdlib today, with sqlite3 being a fairly recent addition.
Fortunately, it is possible for users to just replace the sqlite DLL in a Python installation, with no need of recompiling anything.
Interesting. I assume you are referring to Windows here, right? Does that "just work" because the DLL is in the same directory?
No, it is expected to "just work" because sqlite3 is (presumably) very careful about backwards compatibility, and because the Windows DLL API (just like the shared library API in Linux and other systems) is designed to allow substitution of newer versions. The linkage requirements are roughly that all entry points into a DLL (or shared library) that are referenced by the caller (in this case the wrapper extension module) are supported in the new version, and have the same signature and semantics.
I have no idea, but my WinXP .../Python30/ install has
DLLs/_sqlite3.pyd 52K
This is the wrapper extension module.
DLLs/sqlite3.dll 557K
This is sqlite3 itself. I am presuming that the phrase "replace the sqlite DLL" above refers to this one -- although the same argument actually holds for the .pyd file, which is also a DLL (despite its different extension).
libs/_sqlite3.lib 2K
I think this is a summary of the entry points into one of the above DLLs for the benefit of other code wanting to link against it, but I'm not sure.
For whatever reason, most other things do not have all three files.
You only see a .pyd and a .dll when there's a Python wrapper extension *and* an underlying 3rd party library.
I do not know whether upgrades (like 3.0.0 to 3.0.1) would clobber other things added here.
It would, but not in a harmful way.
That would be a nice feature for lxml, too. We could just make the libxml2 and libxslt DLLs package data under Windows in that case.
-- --Guido van Rossum (home page: http://www.python.org/~guido/)
Guido van Rossum wrote:
On Fri, Mar 6, 2009 at 9:54 AM, Terry Reedy <tjreedy@udel.edu> wrote:
No, it is expected to "just work" because sqlite3 is (presumably) very careful about backwards compatibility, and because the Windows DLL API (just like the shared library API in Linux and other systems) is designed to allow substitution of newer versions. The linkage requirements are roughly that all entry points into a DLL (or shared library) that are referenced by the caller (in this case the wrapper extension module) are supported in the new version, and have the same signature and semantics.
I have no idea, but my WinXP .../Python30/ install has
DLLs/_sqlite3.pyd 52K
This is the wrapper extension module.
DLLs/sqlite3.dll 557K
This is sqlite3 itself. I am presuming that the phrase "replace the sqlite DLL" above refers to this one -- although the same argument actually holds for the .pyd file, which is also a DLL (despite its different extension).
libs/_sqlite3.lib 2K
I think this is a summary of the entry points into one of the above DLLs for the benefit of other code wanting to link against it, but I'm not sure.
For whatever reason, most other things do not have all three files.
You only see a .pyd and a .dll when there's a Python wrapper extension *and* an underlying 3rd party library.
Thanks, I understand now.
I do not know whether upgrades (like 3.0.0 to 3.0.1) would clobber other things added here.
It would, but not in a harmful way.
By 'clobber', I meant 'delete', and I do not see how that would not be harmful ;-). I don't know whether the intaller creates a new directory (and deletes the old), clears and reuses the old, or merely replaces individual files. tjr
On Fri, Mar 6, 2009 at 11:08 AM, Terry Reedy <tjreedy@udel.edu> wrote:
I do not know whether upgrades (like 3.0.0 to 3.0.1) would clobber other things added here.
It would, but not in a harmful way.
By 'clobber', I meant 'delete', and I do not see how that would not be harmful ;-). I don't know whether the intaller creates a new directory (and deletes the old), clears and reuses the old, or merely replaces individual files.
I see. I didn't realize you were talking about adding your own files to these directories. I have no idea; the best way to find out is to experiment. I could see the default policy of Windows installers go either way. -- --Guido van Rossum (home page: http://www.python.org/~guido/)
I see. I didn't realize you were talking about adding your own files to these directories. I have no idea; the best way to find out is to experiment. I could see the default policy of Windows installers go either way.
An upgrade installation removes all old files it installed (the old MSI is still present to know what these files are), then installs new files. Microsoft intended version resources to be used in the upgrade, so the upgrade would only have to replace the files that got a new version (rather than having to do uninstall-then-install). Unfortunately, that is incapable of upgrading .py files. So Microsoft added md5 (I think) hashes that can be used to detect files that don't need upgrade. I tested it, and it was *very* slow, so I reverted to the current procedure. In any case, any additional files present will remain untouched. They will also remain on uninstallation - so uninstallation might not be able to remove all folders that installation originally created. Regards, Martin
DLLs/sqlite3.dll 557K
This is sqlite3 itself. I am presuming that the phrase "replace the sqlite DLL" above refers to this one
Correct.
-- although the same argument actually holds for the .pyd file
Not quite. You can download Windows binaries for newer sqlite versions from sqlite.org, so you don't need a compiler to update sqlite (which you likely would if _sqlite3.pyd would need to be replaced). So you can "bypass" Python and its release process for updates to sqlite.
libs/_sqlite3.lib 2K
I think this is a summary of the entry points into one of the above DLLs for the benefit of other code wanting to link against it, but I'm not sure.
Correct. I don't know why I include them in the MSI - they are there because they were also shipped with the Wise installer. I see no use - nobody should be linking against an extension module.
I do not know whether upgrades (like 3.0.0 to 3.0.1) would clobber other things added here.
It would, but not in a harmful way.
If the user had upgrade sqlite, upgrading Python would undo that, though. So one would have to re-upgrade sqlite afterwards. Regards, Martin
On Fri, Mar 6, 2009 at 22:10, "Martin v. Löwis" <martin@v.loewis.de> wrote:
libs/_sqlite3.lib 2K
I think this is a summary of the entry points into one of the above DLLs for the benefit of other code wanting to link against it, but I'm not sure.
Correct. I don't know why I include them in the MSI - they are there because they were also shipped with the Wise installer. I see no use - nobody should be linking against an extension module.
They even cause trouble. Just yesterday I (well, not me: the pypy translation process) was caught by the presence of the "bz2.lib" file, which pypy found there, just because the linker lists c:\python25\LIBs before other directories. Of course the real bz2.lib, which defines the compression routines, was installed somewhere else, and compilation failed. -- Amaury Forgeot d'Arc
Interesting. I assume you are referring to Windows here, right? Does that "just work" because the DLL is in the same directory?
Correct. Also, because changes to SQLite don't change the API, just the implementation. Regards, Martin
2009/3/5 Guido van Rossum <guido@python.org>:
On Thu, Mar 5, 2009 at 2:39 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
And, BTW, I wouldn't mind getting lxml into the stdlib either.
No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt.
And it depends on Cython, which is wonderful normally, but maybe difficult to deal with in language evolution since we wouldn't have direct control over the C sources. -- Regards, Benjamin
Benjamin Peterson wrote:
2009/3/5 Guido van Rossum <guido@python.org>:
On Thu, Mar 5, 2009 at 2:39 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
And, BTW, I wouldn't mind getting lxml into the stdlib either. No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt.
And it depends on Cython, which is wonderful normally, but maybe difficult to deal with in language evolution since we wouldn't have direct control over the C sources.
I see the point, although I think that this can be dealt with by a) using a specific, stable release version of Cython for a specific Python release, so that this Cython version can be bug fixed if required (it's implemented in Python, after all) or b) adding Cython to the stdlib and building with that Stefan
On Thu, Mar 5, 2009 at 12:52, Stefan Behnel <stefan_ml@behnel.de> wrote:
Benjamin Peterson wrote:
2009/3/5 Guido van Rossum <guido@python.org>:
On Thu, Mar 5, 2009 at 2:39 AM, Stefan Behnel <stefan_ml@behnel.de> wrote:
And, BTW, I wouldn't mind getting lxml into the stdlib either. No matter how beautiful and fast lxml is, it has one downside where it comes to installing it into the stdlib: it is based on large, complex 3rd party libraries, libxml2 and libxslt.
And it depends on Cython, which is wonderful normally, but maybe difficult to deal with in language evolution since we wouldn't have direct control over the C sources.
I see the point, although I think that this can be dealt with by
a) using a specific, stable release version of Cython for a specific Python release, so that this Cython version can be bug fixed if required (it's implemented in Python, after all)
So including Cython source in the stdlib and then check in the generated C code? I don't think that adding another build dependency for the stdlib, especially for one already with several external dependencies itself, is a good idea.
or
b) adding Cython to the stdlib and building with that
That's an entirely separate discussion (for which my initial answer is to not consider it until it has stabilized to a 1.0 release). -Brett
Hi, Brett Cannon wrote:
On Thu, Mar 5, 2009 at 12:52, Stefan Behnel wrote:
Benjamin Peterson wrote:
it depends on Cython, which is wonderful normally, but maybe difficult to deal with in language evolution since we wouldn't have direct control over the C sources. I see the point, although I think that this can be dealt with by
a) using a specific, stable release version of Cython for a specific Python release, so that this Cython version can be bug fixed if required (it's implemented in Python, after all)
So including Cython source in the stdlib and then check in the generated C code?
Did I give the impression that a) was my preferred solution? ;)
b) adding Cython to the stdlib and building with that
That's an entirely separate discussion (for which my initial answer is to not consider it until it has stabilized to a 1.0 release).
Yes, that *is* an entirely separate discussion - for which my initial answer is to consider it as soon as it is in a state where the compiler is good enough to be useful and the language it compiles is stable enough to be future proof. The language is almost Python, and the core syntax extensions (compared to Python 2.6/3.0) haven't changed for a couple of releases (except for the buffer syntax, which I personally don't consider core but really nice to have). The official goal for a 1.0 release is to compile Python programs, BTW. I don't think the stdlib needs to wait for that. Stefan
Guido van Rossum wrote:
Based on the sad example of BerkeleyDB, which was initially welcomed into the stdlib but more recently booted out for reasons having to do with the release cycle of the external dependency and other issues typical for large external dependencies, I think we should be very careful with including it in the standard library.
Yes. My experience of these kinds of libraries (bdb, lxml, etc) is that having them in the Python stdlib is a "bad thing". Why? Because python (quite rightly, as I'm being convinced!) has a very conservative policy of changes is 3rd point releases. This, however, means you end up getting 'stuck" with a release of something like lxml that you can't upgrade to get new features because you, say, use a debian-packages python which only upgrades when Debian next decide to do release... In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for: - python to grow a decent, cross platform, package management system - the standard library to actually shrink to a point where only libraries that are not released elsewhere are included I'd be interested to know how many users of python also felt this way ;-) Chris
In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for:
- python to grow a decent, cross platform, package management system
- the standard library to actually shrink to a point where only libraries that are not released elsewhere are included
I'd be interested to know how many users of python also felt this way ;-)
I don't like the standard library to shrink. It's good that batteries are included. Regards, Martin
Martin v. Löwis wrote:
In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for:
- python to grow a decent, cross platform, package management system
- the standard library to actually shrink to a point where only libraries that are not released elsewhere are included
I'd be interested to know how many users of python also felt this way ;-)
I don't like the standard library to shrink. It's good that batteries are included.
I have mixed feelings. It is great that the batteries are included, but some batteries are showing their age or not maintained (who maintains IDLE? - does the calendar module really warrant being in the standard library? - imaplib is really not useful and IMAPClient which isn't in the standard library is much better [1]). If a library is well maintained then there seems to be little point in moving it into the standard library as it may actually be harder to maintain, and if a library has no active developers then do we really want it in the standard library... On the other hand there are some standard tools that a significant portion of the community use (Python Imaging Library and the PyWin32 extensions for example) that aren't in the standard library. I think other developers have similar mixed feelings, or at least there are enough people on both sides of the fence that it is very hard to make changes. Perhaps this is the way it should be. Michael [1] http://freshfoo.com/wiki/CodeIndex
Regards, Martin _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/fuzzyman%40voidspace.org.u...
-- http://www.ironpythoninaction.com/ http://www.voidspace.org.uk/blog
I have mixed feelings. It is great that the batteries are included, but some batteries are showing their age or not maintained (who maintains IDLE? - does the calendar module really warrant being in the standard library? - imaplib is really not useful and IMAPClient which isn't in the standard library is much better [1]).
I certainly agree that little-used modules should be removed (by means of a proper deprecation procedure). I do think it is fairly important that IDLE remains included (whether or not gpolo takes ownership of it). As for imaplib: I can't comment on the library itself, since I never use it. However, given the number of bug reports that we receive, it seems there is a fairly significant followership of it.
If a library is well maintained then there seems to be little point in moving it into the standard library as it may actually be harder to maintain
It depends. For quickly evolving libraries, it might be harder to maintain them in the core, as you can't release as quickly as you desire. In other cases, it simplifies maintenance: whenever a systematic API change is added, all standard library modules get typically updated by whoever makes the systematic change. That is more productive than having each individual maintainer to figure out what to change in response. However, I don't think of the maintainer point of view that much: it's rather the end users (i.e. application developers) who I worry about: Should we remove regular expressions from Python, just because the library doing it is unmaintained?
On the other hand there are some standard tools that a significant portion of the community use (Python Imaging Library and the PyWin32 extensions for example) that aren't in the standard library.
I continue having the same position: if the authors of those respective libraries would like to contribute it to the core (and eventually drop the out-of-core fork), then I would be happy to let them do that. Of course, Guido's cautioning wrt. external libraries that those depend on still applies: if there are strong external dependencies, the library would have to be really important to the community to still include it.
I think other developers have similar mixed feelings, or at least there are enough people on both sides of the fence that it is very hard to make changes. Perhaps this is the way it should be.
I think so, yes. Decisions will be made on a case-by-case basis, going either direction one time or the other. Regards, Martin
Martin v. Löwis <martin@v.loewis.de> wrote:
If a library is well maintained then there seems to be little point in moving it into the standard library as it may actually be harder to maintain
It depends. For quickly evolving libraries, it might be harder to maintain them in the core, as you can't release as quickly as you desire. In other cases, it simplifies maintenance: whenever a systematic API change is added, all standard library modules get typically updated by whoever makes the systematic change. That is more productive than having each individual maintainer to figure out what to change in response.
This is a complicated issue. But two sub-threads seem to be about (1) modules dependent (or wrapping) a large, complicated third-party library under active development, and (2) hard-to-routinely-test modules, like imaplib. Bill
Michael Foord wrote:
I have mixed feelings. It is great that the batteries are included, but some batteries are showing their age or not maintained (who maintains IDLE? - does the calendar module really warrant being in the standard library? - imaplib is really not useful and IMAPClient which isn't in the standard library is much better [1]).
Wow, interesting case in point ;-) I actually stumbled into imaplib, found it not useful and gave up and solved the problem a completely different way. Had I not tripped over it, I might have found IMAPClient! Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Martin v. Löwis wrote:
In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for:
- python to grow a decent, cross platform, package management system
- the standard library to actually shrink to a point where only libraries that are not released elsewhere are included
I'd be interested to know how many users of python also felt this way ;-)
I don't like the standard library to shrink. It's good that batteries are included.
Perhaps we could encourage more "jumbo" distributions, like Enthought's and ActiveState's. I suspect many people would rather be able to maintain their Python functionality as a single product. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/
Perhaps we could encourage more "jumbo" distributions, like Enthought's and ActiveState's. I suspect many people would rather be able to maintain their Python functionality as a single product.
I think the concept of "jumbo distribution" has been lost. I mentioned it to one of the Enthought people (sorry, forgot who exactly), and he said he had never heard the term. Looking back, it seems that you have to be a commercial enterprise to produce such a thing. There is the python.org distribution, where many volunteers maintain it, and then there are the two (?) free-commercial distributions (ActivePython, and EPD). I'm skeptical that there can be motivation for creating another "community" jumbo distribution - why would anybody put efforts into maintaining it? You don't get much credit for release engineering - except from fellow release engineers. In addition, you have the Linux distributions, which you can also count as jumbo Python distributions (and also jumbo Perl, jumbo Java, jumbo LISP - at least for Debian :-). Again, many of these are commercially based, although there still seems to be space for multiple community distributions (Debian, Gentoo). This is precisely the reason why I want Python to continue including its batteries. If we give that up, it will not come back, and users get frustrated that they have to collect stuff from many places, and that the stuff doesn't fit together completely. What that means for BeautifulSoup, I don't know. First, its authors would have to offer contributing it, and then experts should also agree that this would be a useful inclusion. Some apparently say that html5lib would be a better choice. If that's the thing that is currently on release 0.11, then I think we should take no action at this point - I don't want to include anything that has version 0.11. Regards, Martin
This is precisely the reason why I want Python to continue including its batteries. If we give that up, it will not come back, and users get frustrated that they have to collect stuff from many places, and that the stuff doesn't fit together completely.
I concur. Raymond
Steve Holden wrote:
Perhaps we could encourage more "jumbo" distributions, like Enthought's and ActiveState's. I suspect many people would rather be able to maintain their Python functionality as a single product.
I think you'll find it split.. People who use and love things like zc.buildout do so because they want to free package maintainers to do their own release cycles and not have individual packages held back by the need to release the "whole project" in one go. However, yes, I'm sure there are just as many people who want to install "just one thing" and have it all there. (although they'll be sadly disappointed when they realise whatever it is they need (lxml, PIL, xlrd,xlwt) isn't there. That said, a decent package management system in the core *and* the jumbo installers you mention would likely keep both camps happy ;-) Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Martin v. Löwis wrote:
In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for:
- python to grow a decent, cross platform, package management system
- the standard library to actually shrink to a point where only libraries that are not released elsewhere are included
I'd be interested to know how many users of python also felt this way ;-)
I don't like the standard library to shrink. It's good that batteries are included.
If a decent package management system *was* included, this wouldn't be an issue.. Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
2009/3/13 Chris Withers <chris@simplistix.co.uk>:
If a decent package management system *was* included, this wouldn't be an issue..
Remember that a "decent package management system" needs to handle filling in all the forms and arranging approvals to get authorisation for packages when you download them. And no, I'm *not* joking. People in a locked-down corporate environment really do benefit from just having to get the OK for "Python", and then knowing that they have all they need. Even ignoring the above, your "decent package management system" has to deal with systems with no internet connectivity - just copying the Python installer onto my pen drive to put on my Mum's laptop so I can write some admin jobs for her, is a lot easier than having to pick and choose from PyPI what to download before I start. -1 on slimming down the stdlib. (OTOH, I've no problem with seeing an improved package system - just don't use it as an excuse to cripple the stdlib!) Paul
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Paul Moore wrote:
2009/3/13 Chris Withers <chris@simplistix.co.uk>:
If a decent package management system *was* included, this wouldn't be an issue..
Remember that a "decent package management system" needs to handle filling in all the forms and arranging approvals to get authorisation for packages when you download them.
And no, I'm *not* joking. People in a locked-down corporate environment really do benefit from just having to get the OK for "Python", and then knowing that they have all they need.
You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces. Given the *existing* stdlib and network connectivity, nothing any corporate security blackshirt can do will prevent an even moderately-motivated person from executing arbitrary code downloaded from elsewhere. In that case, what is the point in trying to help those who impose such craziness?
Even ignoring the above,
Which the language and library should do.
your "decent package management system" has to deal with systems with no internet connectivity - just copying the Python installer onto my pen drive to put on my Mum's laptop so I can write some admin jobs for her, is a lot easier than having to pick and choose from PyPI what to download before I start.
Nobody is arguing that there should be *no* batteries in the stdlib: we are talking about the ones which are leaking acid already, or which might get there soon due to lack of maintenance.
-1 on slimming down the stdlib. (OTOH, I've no problem with seeing an improved package system - just don't use it as an excuse to cripple the stdlib!)
Part of this discussion is about not *expanding* the stdlib: give a reasonable packaging story, leaving a given component out of the library is a defensible choice. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJumAJ+gerLs4ltQ4RApC7AKDVsIxfBlw6CWWLa+VhaASyDz+LFQCfQp5I yzrdYPo1FbXGAB90Ucf/Le8= =bCTx -----END PGP SIGNATURE-----
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Paul Moore wrote:
2009/3/13 Chris Withers <chris@simplistix.co.uk>:
If a decent package management system *was* included, this wouldn't be an issue.. Remember that a "decent package management system" needs to handle filling in all the forms and arranging approvals to get authorisation for packages when you download them.
And no, I'm *not* joking. People in a locked-down corporate environment really do benefit from just having to get the OK for "Python", and then knowing that they have all they need.
You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces. Given the *existing* stdlib and network connectivity, nothing any corporate security blackshirt can do will prevent an even moderately-motivated person from executing arbitrary code downloaded from elsewhere. In that case, what is the point in trying to help those who impose such craziness?
I (and most people, I presume) would not run arbitrary program downloaded from somewhere else on a corporate server that holds many important customer data even when there is no technical or even bureaucratic restriction, maybe I will sneak around on a workstation but definitely not on the server especially if I love my job and want to keep it (I'm a student though so that applies to me in the future).
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lie Ryan wrote:
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
2009/3/13 Chris Withers <chris@simplistix.co.uk>:
If a decent package management system *was* included, this wouldn't be an issue.. Remember that a "decent package management system" needs to handle filling in all the forms and arranging approvals to get authorisation for packages when you download them.
And no, I'm *not* joking. People in a locked-down corporate environment really do benefit from just having to get the OK for "Python", and then knowing that they have all they need. You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces. Given the *existing* stdlib and network connectivity, nothing any corporate security blackshirt can do will prevent an even moderately-motivated
Paul Moore wrote: person from executing arbitrary code downloaded from elsewhere. In that case, what is the point in trying to help those who impose such craziness?
I (and most people, I presume) would not run arbitrary program downloaded from somewhere else on a corporate server that holds many important customer data even when there is no technical or even bureaucratic restriction, maybe I will sneak around on a workstation but definitely not on the server especially if I love my job and want to keep it (I'm a student though so that applies to me in the future).
I'm not arguing that employees should violate their employers' policies: I'm arguing that Python itself shouldn't try to cater to such policies. Note that I'm not talking about running code pushed on me by malware authors, either: I'm talking about "ordinary" software development activities like using a script from a cookbook, or using a well-tested and supported library, rather than NIH. Given that the out-of-the-box Python install already has facilities for retrieving text over the net and executing that text, the notion of "locking down" a machine to include only the bits installed in the stock Python install is just "security theatre;" such a machine shouldn't have Python installed at all (nor a C compiler, etc.) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJunUx+gerLs4ltQ4RAojAAKCdoliiVDoGoKzfGXNuQUZVmoPrhgCfXeSa pGCKI3wLt9W1A4ccnINSdLs= =3H9u -----END PGP SIGNATURE-----
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Lie Ryan wrote:
Tres Seaver wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
2009/3/13 Chris Withers <chris@simplistix.co.uk>:
If a decent package management system *was* included, this wouldn't be an issue.. Remember that a "decent package management system" needs to handle filling in all the forms and arranging approvals to get authorisation for packages when you download them.
And no, I'm *not* joking. People in a locked-down corporate environment really do benefit from just having to get the OK for "Python", and then knowing that they have all they need. You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces. Given the *existing* stdlib and network connectivity, nothing any corporate security blackshirt can do will prevent an even moderately-motivated
Paul Moore wrote: person from executing arbitrary code downloaded from elsewhere. In that case, what is the point in trying to help those who impose such craziness? I (and most people, I presume) would not run arbitrary program downloaded from somewhere else on a corporate server that holds many important customer data even when there is no technical or even bureaucratic restriction, maybe I will sneak around on a workstation but definitely not on the server especially if I love my job and want to keep it (I'm a student though so that applies to me in the future).
I'm not arguing that employees should violate their employers' policies: I'm arguing that Python itself shouldn't try to cater to such policies.
Basically you're saying: Python is designed not to work on such environment.
Note that I'm not talking about running code pushed on me by malware authors, either: I'm talking about "ordinary" software development activities like using a script from a cookbook, or using a well-tested and supported library, rather than NIH.
Some companies have /very/ strict policies on running anything on live server, including scripts you write yourself. The problem is if the script goes awry, it might disturb the stability or even security of the server.
Given that the out-of-the-box Python install already has facilities for retrieving text over the net and executing that text, the notion of "locking down" a machine to include only the bits installed in the stock Python install is just "security theatre;" such a machine shouldn't have Python installed at all (nor a C compiler, etc.)
When the server administrator is already freaked out about adding an script developed by in-house employee, what about adding an external module? Of course all of this does not (usually) apply to regular workstation. A messed up workstation only means a reinstall, a messed up server may mean company reputation.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Lie Ryan wrote:
Tres Seaver wrote:
I'm not arguing that employees should violate their employers' policies: I'm arguing that Python itself shouldn't try to cater to such policies.
Basically you're saying: Python is designed not to work on such environment.
No, I'm saying that it isn't Python's responibility to enable that kind of policy. If it happens to be good *for Python* to have a a package installation / upgrade machinery (the real point of the discussion), then it will be up to the paranoid admin to figure out how to disable that feature: it isn't the problem of the Python developers. There are real costs to "batteries included," especially for modules which don't get used much. One such cost is that an unused module tends to bitrot over time; another is that the presence of a module in the stdlib may "shadow" other, better modules / packages which are not in the stdlib. Those costs need to be balanced against the undoubted benefits, when making the choice to add or remove a module from the stdlib.
Note that I'm not talking about running code pushed on me by malware authors, either: I'm talking about "ordinary" software development activities like using a script from a cookbook, or using a well-tested and supported library, rather than NIH.
Some companies have /very/ strict policies on running anything on live server, including scripts you write yourself. The problem is if the script goes awry, it might disturb the stability or even security of the server.
I understand that such policies exist, and why. However, I don't think their existence is a relevant constraint on the design or implementation of Python.
Given that the out-of-the-box Python install already has facilities for retrieving text over the net and executing that text, the notion of "locking down" a machine to include only the bits installed in the stock Python install is just "security theatre;" such a machine shouldn't have Python installed at all (nor a C compiler, etc.)
When the server administrator is already freaked out about adding an script developed by in-house employee, what about adding an external module?
An admin who is that paranoid shouldn't be giving people shell access, either: given shell acccess, network connectivity, and the existing stdlib the admin's policy is unenforceable as a technical measure. Even if the user can't create a file anywhere on the filesystem, the interpreter prompt is enough to allow the user to evade the policy. Heck, the *bash* prompt is enough to wreck it, e.g. using "here documents," even without network connectivity. As an aisde: anybody who is installing packages from PyPI on a production box (rather than using an index under their own control) isn't sufficiently paranoid: it can and does happen that people re-upload changed packages to PyPI without changing the version, for instance, not to mention removing older releases. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJuooU+gerLs4ltQ4RAsdnAKCSkKc94bHvHBIrILampl9+Mksz8wCeJSBe +Yl5HVmwQ6StGTcWNmDiSjE= =qGID -----END PGP SIGNATURE-----
Lie Ryan wrote:
Some companies have /very/ strict policies on running anything on live server, including scripts you write yourself. The problem is if the script goes awry, it might disturb the stability or even security of the server.
Yes, "we" as a profession right software and have responsibilities. Get over it. It's what dev servers, uat, backups and dr are for... I see no relation between this and packaging other than that any packaging story needs to support privtae distribution servers.
When the server administrator is already freaked out about adding an script developed by in-house employee, what about adding an external module?
Then he's a muppet, plain and simple. If he's not, then he will have tested the whole setup before hand and got signoff from the developers and users who are responsbile for doing so. All of this has very little to do with packaging. Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Tres Seaver wrote:
Given that the out-of-the-box Python install already has facilities for retrieving text over the net and executing that text, the notion of "locking down" a machine to include only the bits installed in the stock Python install is just "security theatre;" such a machine shouldn't have Python installed at all (nor a C compiler, etc.)
Indeed, in the real world this locking down is done at the firewall level. As for packaging in this scenario, that's what private package servers are for... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
2009/3/13 Tres Seaver <tseaver@palladion.com>:
Paul Moore wrote:
2009/3/13 Chris Withers <chris@simplistix.co.uk>:
If a decent package management system *was* included, this wouldn't be an issue..
Remember that a "decent package management system" needs to handle filling in all the forms and arranging approvals to get authorisation for packages when you download them.
And no, I'm *not* joking. People in a locked-down corporate environment really do benefit from just having to get the OK for "Python", and then knowing that they have all they need.
You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces.
I am not. What I *am* doing is saying (obliquely, I admit) is that for a package management system to be "decent enough" for stripping down the stdlib to not be an issue, it has to address these points (which clearly it can't). In other words, the problems inherent in restricting the stdlib aren't ones which are solvable by a package management system. Paul.
Paul Moore wrote:
I am not. What I *am* doing is saying (obliquely, I admit) is that for a package management system to be "decent enough" for stripping down the stdlib to not be an issue, it has to address these points (which clearly it can't).
Sure it can, either by supporting "offline bundles" or by having sets of packages that are marked as "Python Approved!" or some such and so all have the same license. Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
2009/3/23 Chris Withers <chris@simplistix.co.uk>:
Paul Moore wrote:
I am not. What I *am* doing is saying (obliquely, I admit) is that for a package management system to be "decent enough" for stripping down the stdlib to not be an issue, it has to address these points (which clearly it can't).
Sure it can, either by supporting "offline bundles" or by having sets of packages that are marked as "Python Approved!" or some such and so all have the same license.
OK, I'll drop out of the discussion at this point. We clearly have such different experience that we aren't understanding each others' points - and the misunderstandings aren't moving the discussion forwards. Paul
Tres Seaver wrote:
You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces. Given the *existing* stdlib and network connectivity, nothing any corporate security blackshirt can do will prevent an even moderately-motivated person from executing arbitrary code downloaded from elsewhere. In that case, what is the point in trying to help those who impose such craziness?
Network connectivity isn't a given, even today. So yes, there are environments that are secure (i.e. no network connectivity), and there are environments where developers are trusted (shock, horror) to actually follow company policy and get all licenses vetted by their Contracts group before installing downloaded software on company machines. Given that even some of the core developers work in environments like that, then yes, I believe Python can and should take reasonable steps to enable its use in such situations. And the most reasonably step Python can take on that front is to continue to provide a relatively powerful standard library *even if* a flexible and otherwise useful package management approach is added at some stage. If someone else decides to create a MinimalPython which consists solely of something like easy_install and whatever is needed to run it (i.e. the opposite of the "fat" bundles from folks like ActiveState and Enthought), then more power to them. But I don't believe the official releases from python.org should go that way. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nick Coghlan wrote:
Tres Seaver wrote:
You are plainly joking: nothing in Python should know or care about the various bureaucratic insanities in some workplaces. Given the *existing* stdlib and network connectivity, nothing any corporate security blackshirt can do will prevent an even moderately-motivated person from executing arbitrary code downloaded from elsewhere. In that case, what is the point in trying to help those who impose such craziness?
Network connectivity isn't a given, even today. So yes, there are environments that are secure (i.e. no network connectivity), and there are environments where developers are trusted (shock, horror) to actually follow company policy and get all licenses vetted by their Contracts group before installing downloaded software on company machines.
Given that even some of the core developers work in environments like that, then yes, I believe Python can and should take reasonable steps to enable its use in such situations.
And the most reasonably step Python can take on that front is to continue to provide a relatively powerful standard library *even if* a flexible and otherwise useful package management approach is added at some stage.
My inclination would be to leave the stdlib largely as is, except that occostonally I would argue for ripping out a particular obsolete / bitrotted module. A couple of other points: - - Absent a sufficiently powerful package management system, the pressure to add modules to the stdlib (or keep them) is higher because it is harder for *all* Python users to add them, or replace them if dropped. - - The choice to add or remove a module to / from the stdlib should be made on the merits of the module, without regard to the kind of specialized deployment policies you outline. - - Changing the stdlib in a new release of Python is probably irrelevant for the kind of environments you allude to, as there is likely as much review involved to approve a new version of Python as there was in approving it in the first place: of the few I know of today, all are still running Python 2.1.x and / or 2.2.x for this reason.
If someone else decides to create a MinimalPython which consists solely of something like easy_install and whatever is needed to run it (i.e. the opposite of the "fat" bundles from folks like ActiveState and Enthought), then more power to them. But I don't believe the official releases from python.org should go that way.
Note that I am *not* advocating scrubbing / exploding the stdlib. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJuy9Y+gerLs4ltQ4RAranAJ4rCXgq0opHPki6OmlABbaqE3D1sQCeJ7Zt Em6VMK1u+6+xYsoqixwfoJ4= =YzN7 -----END PGP SIGNATURE-----
Tres, for some reason every time you reply to the list, you send TWO copies instead of one: To: python-dev@python.org CC: Python Dev <python-dev@python.org> Could you please fix that? -- Steven D'Aprano
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Steven D'Aprano wrote:
Tres, for some reason every time you reply to the list, you send TWO copies instead of one:
To: python-dev@python.org CC: Python Dev <python-dev@python.org>
Could you please fix that?
I can try: I normally post via gmane, and leave python-dev CC'ed so that folks who read via the list don't have their replies to me fall off list (which happens on some lists, anyway). I will trim the CC in the future, and hope for the best. Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJuzWG+gerLs4ltQ4RAsQfAKCBGfeI6FEP8cNbOdh0zFhLjj65CgCgiLZb 725QgMYFCyhdM6OP5+SC04U= =yRwI -----END PGP SIGNATURE-----
Tres Seaver wrote:
Steven D'Aprano wrote:
Tres, for some reason every time you reply to the list, you send TWO copies instead of one:
To: python-dev@python.org CC: Python Dev <python-dev@python.org>
Could you please fix that?
I can try: I normally post via gmane, and leave python-dev CC'ed so that folks who read via the list don't have their replies to me fall off list (which happens on some lists, anyway).
I will trim the CC in the future, and hope for the best.
That's what works best for me, anyway. Stefan
Nick Coghlan wrote:
Network connectivity isn't a given, even today.
Indeed, now that is an important consideration. Packaging systems need to support "offline" modes. Buildout already does...
If someone else decides to create a MinimalPython which consists solely of something like easy_install and whatever is needed to run it (i.e. the opposite of the "fat" bundles from folks like ActiveState and Enthought), then more power to them. But I don't believe the official releases from python.org should go that way.
My frustration is that some of the big standard libraries are locked to python releases meaning they carry around bugs for longer and are harder to contribute to than necessary... Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers wrote:
Nick Coghlan wrote:
Network connectivity isn't a given, even today.
Indeed, now that is an important consideration. Packaging systems need to support "offline" modes. Buildout already does...
If someone else decides to create a MinimalPython which consists solely of something like easy_install and whatever is needed to run it (i.e. the opposite of the "fat" bundles from folks like ActiveState and Enthought), then more power to them. But I don't believe the official releases from python.org should go that way.
My frustration is that some of the big standard libraries are locked to python releases meaning they carry around bugs for longer and are harder to contribute to than necessary...
Possibly so, but there are conflicting requirements and Python can't satisfy them all without getting more complex. Some people want an "all batteries and kitchen sink included" distro that they can treat as a single component for configuration control purposes. Others, like you, want the libraries to be separated out to allow separate fixes. regards Steve -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/
Steve Holden wrote:
Some people want an "all batteries and kitchen sink included" distro that they can treat as a single component for configuration control purposes. Others, like you, want the libraries to be separated out to allow separate fixes.
Yes, but while the "batteries included" option can be rolled from the "no batteries" version, the reverse is not true. The current package management systems can't even figure out that a version of a standard lib library is "what came with Python 2.x.y" rather than a bugfixed version that's been later installed (pyunit springs to mind as a potential candidate here...) cheers, Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
On Fri, 13 Mar 2009 at 09:58, Chris Withers wrote:
Martin v. L�wis wrote:
In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for:
- python to grow a decent, cross platform, package management system
- the standard library to actually shrink to a point where only libraries that are not released elsewhere are included
I'd be interested to know how many users of python also felt this way ;-)
I don't like the standard library to shrink. It's good that batteries are included.
If a decent package management system *was* included, this wouldn't be an issue..
I disagree. One of the jobs I've had is release management for internal software projects that depend on various external pieces. Release integration tested against specific versions of those external packages, and those were the packages that needed to wind up on the system when the release was installed. I've done systems depending on both perl and python, and let me tell you, python is way, _way_ easier to manage. With python, I have a dependency on a particular python version, and then maybe one or two add on packages. With perl, I have perl, and then I have a gadzillion cpan modules. I don't know how many a gadzillion is, because what I wound up doing was making a local copy of the cpan archive, checking that in to the repository, and writing up some scripts that made sure I pulled the actual install from my cpan snapshot and supported the developers in updating that snapshot when we were building a new version. (Nor was that the only problem with perl....what idiot decided it was OK to interactively prompt for things during a batch install process?! And without providing any way to script the answers, at least that I could find!) So I'm +1 for keeping the Python stdlib as comprehensive as sensible. (Please note that last word...I've no objection to pruning things that are no longer serving a useful purpose, or that are better managed outside the core.) -- R. David Murray http://www.bitdance.com
R. David Murray wrote:
On Fri, 13 Mar 2009 at 09:58, Chris Withers wrote:
Martin v. L�wis wrote:
- python to grow a decent, cross platform, package management system - the standard library to actually shrink to a point where only
I'd be interested to know how many users of python also felt
In light of this, what I'd love to see (but sadly can't really help with, and am not optimistic about happening) is for: libraries that are not released elsewhere are included this way > ;-)
I don't like the standard library to shrink. It's good that batteries are included.
If a decent package management system *was* included, this wouldn't be an issue..
I disagree. One of the jobs I've had is release management for internal software projects that depend on various external pieces. Release integration tested against specific versions of those external packages, and those were the packages that needed to wind up on the system when the release was installed. I've done systems depending on both perl and python, and let me tell you, python is way, _way_ easier to manage. With python, I have a dependency on a particular python version, and then maybe one or two add on packages. With perl, I have perl, and then I have a gadzillion cpan modules. I don't know how many a gadzillion is, because what I wound up doing was making a local copy of the cpan archive, checking that in to the repository, and writing up some scripts that made sure I pulled the actual install from my cpan snapshot and supported the developers in updating that snapshot when we were building a new version. (Nor was that the only problem with perl....what idiot decided it was OK to interactively prompt for things during a batch install process?! And without providing any way to script the answers, at least that I could find!)
So I'm +1 for keeping the Python stdlib as comprehensive as sensible. (Please note that last word...I've no objection to pruning things that are no longer serving a useful purpose, or that are better managed outside the core.)
Just for clarity, when I said a "jumbo distribution" I meant one with all necessary libraries to run and support a specified set of python functionalities. The point is *not* to add other libraries (which invites version creep and undermines stability) but to have everything you need for a given (core plus) set of functionality. I believe my original message acknowledged that this wouldn't be everyone's cup of tea, but if a "good"* set of applications were analyzed and a distribution built to support just those (presumably Python) subsystems, you would have a good core that you can augment with the system-installed Python if you are so minded. A jumbo shouldn't try to replace the system-installed Python because hopefully different Python (jumbo) distributions would coexist on the same system, supporting those Python elements for which their configuration is acceptable. I would not expect core developers to have to give the jumbos much mind, except in so far as they might require support for (slightly?) different versions of Python. A 1.5.2 process can talk to a 3.1 process without any problems at all as long as the application protocol is unambiguous. Why this insistence on trying to do everything with a single interpreter? Sure, it might use more resources to have three different versions of PIL in use from three different jumbos, but let's cross that bridge when we come to it. Naturally, in Python there are already several environments that will compute the required library subset necessary to support an application, though at present they do it only across a single Python version and application. However, writing a simple GUI or command-line program to call the other Python modules will give you a single analyzable module tree. You don't even have to distribute the GUI if you don't want ... So I don't see "jumbo" as replacing "batteries included". More like "comes with 14v 300AH accumulator and support for domain name and email services" or "suitable for GeoDjango virtuals under VirtualBox with NAT addressing". For non-Python stuff (e.g. PostgreSQL, though SQLite is plenty good enough for experiments) I think the API has to be stable enough to accommodate some version variations. regards Steve * This is not a democracy: use your own prejudices to decide. -- Steve Holden +1 571 484 6266 +1 800 494 3119 Holden Web LLC http://www.holdenweb.com/ Want to know? Come to PyCon - soon! http://us.pycon.org/
R. David Murray wrote:
I disagree. One of the jobs I've had is release management for internal software projects that depend on various external pieces. Release integration tested against specific versions of those external packages, and those were the packages that needed to wind up on the system when the release was installed. I've done systems depending on both perl and python, and let me tell you, python is way, _way_ easier to manage. With python, I have a dependency on a particular python version, and then maybe one or two add on packages.
Well, python already has tools available to do exactly this.: buildout from a private egg repository will do exactly what you're after. However, its built on top of setuptools, which is flawed, and it's not blessed as "official core python", so there's lots of room for improvement! Chris -- Simplistix - Content Management, Zope & Python Consulting - http://www.simplistix.co.uk
Chris Withers <chris <at> simplistix.co.uk> writes:
Well, python already has tools available to do exactly this.: buildout from a private egg repository will do exactly what you're after.
However, its built on top of setuptools, which is flawed, and it's not blessed as "official core python", so there's lots of room for improvement!
Could you explain how buildout is an improvement over other systems? Its documentation seems full of generic wording ("parts" etc.) that I can't make sense of. Regards Antoine.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Antoine Pitrou wrote:
Chris Withers <chris <at> simplistix.co.uk> writes:
Well, python already has tools available to do exactly this.: buildout from a private egg repository will do exactly what you're after.
However, its built on top of setuptools, which is flawed, and it's not blessed as "official core python", so there's lots of room for improvement!
Could you explain how buildout is an improvement over other systems? Its documentation seems full of generic wording ("parts" etc.) that I can't make sense of.
It has a couple of differentiators from a "stock" distutils or setuptools-based installation: - Distributions are compiled and installed as eggs, but in a directory which is neither on the sys.path nor one of those marked as a 'site' directory. zc.buildout *does* use the dependency information, if present in setup.py, to fetch dependent distributions (like easy_install with the --multi-version option). - Scripts generated from the eggs get a generated prologue which sets up the sys.path expressing the requirements spelled out for that script. - It externalizes much of the "how to build it" information out of 'setup.py' into a separate "INI-style" configuration file, - It uses "recipes" as extensions, which enable a lot of tasks which are unsupported or poorly supported by distutils / setuptools (e.g., installing non-Python software using "configure-make-make install", generating config files, etc.) Tres. - -- =================================================================== Tres Seaver +1 540-429-0999 tseaver@palladion.com Palladion Software "Excellence by Design" http://palladion.com -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFJx9+q+gerLs4ltQ4RAlheAJ0Wq98Q3+SwgeaZthN2FrVYLyle2wCg3NiG QX6ojgLlSmBJY1g48gMLimM= =pkMq -----END PGP SIGNATURE-----
I don't like the standard library to shrink. It's good that batteries are included.
If a decent package management system *was* included, this wouldn't be an issue..
You can prove anything with a false premise... I believe that a package management system that is decent cannot possibly be included in Python (IOW, any packaging system included in Python cannot be decent enough to allow removal of things from the standard library) Regards, Martin
participants (26)
-
"Martin v. Löwis"
-
Amaury Forgeot d'Arc
-
Antoine Pitrou
-
Barry Warsaw
-
Benjamin Peterson
-
Bill Janssen
-
Brett Cannon
-
Chris Withers
-
Georg Brandl
-
Guido van Rossum
-
Ivan Krstić
-
James Y Knight
-
Lie Ryan
-
Michael Foord
-
Nick Coghlan
-
Oleg Broytmann
-
Paul Moore
-
R. David Murray
-
Raymond Hettinger
-
Stefan Behnel
-
Steve Holden
-
Steven D'Aprano
-
Terry Reedy
-
Tony Nelson
-
Tres Seaver
-
Vaibhav Mallya