
On 3/17/06, A.M. Kuchling amk@amk.ca wrote:
Thought: We should drop all of httplib, urllib, urllib2, and ftplib, and instead adopt some third-party library for HTTP/FTP/whatever, write a Python wrapper, and use it instead. (The only such library I know of is libcurl, but doubtless there are other candidates; see http://curl.haxx.se/libcurl/competitors.html for a list.)
Rationale:
HTTP client-side support is pretty complicated. HTTP itself has many corners (httplib.py alone is 1420 lines long, and urllib2 is 1300 lines).
There are many possible permutations of proxying, SSL on/off, and authentication. We probably haven't tested every permutation, and probably lack the volunteer effort to test them all. If you search for 'http' in the bug tracker, you find about 16 or so bugs submitted for httplib/urllib/urllib2, most of them for one permutation or another.
With a third-party library, the work of maintaining RFC compliance falls to someone else.
A third-party library might support more features than we have time to implement.
A downside: these libraries would be in C, and might be the source of security bugs. Python code may be buggy, but probably won't fall prey to buffer overflow. We'd also have to keep in sync with the library.
There is also the issue that PyPy could have problems since they have always preferred we keep pure Python versions of stuff around when possible (I assume IronPython has .NET Internet libraries to use).
Similar arguments could be made for a server-side solution, but here I have no idea what we might choose. A server-side HTTP implementation
- a WSGI gateway might be all that Python 3000 needs.
Good idea? Dumb idea?
Possibly good. We have the precendent of zlib, expat, etc. The key is probably the license is compatible with ours (which libcurl seems to be: MIT/X derivative).
I know that having fixed urllib bugs I sure wouldn't mind if I didn't have to read another RFC on URL formats. =)
But maybe this also poses a larger question of where for Py3K we want to take the stdlib. Ignoring its needed cleanup and nesting of the namespace, do we want to try to use more external tools by importing them and writing a Pythonic wrapper? Or do we want to not do that and try to keep more things under our control and go with the status quo? Or do we want to really prune down the stdlib and use more dynamic downloading ala Cheeseshop and setuptools?
I support the first even though it makes problems for PyPy since it should allow us to have more quality code with less effort on our part. I also support the second since we seem to be able to pull it off. For the third option I would want to be very careful with what is and is not included since Python's "batteries included" solution is an important part of Python and I would not want that to suffer.
-Brett