Py3K thought: use external library for client-side HTTP

Thought: We should drop all of httplib, urllib, urllib2, and ftplib, and instead adopt some third-party library for HTTP/FTP/whatever, write a Python wrapper, and use it instead. (The only such library I know of is libcurl, but doubtless there are other candidates; see http://curl.haxx.se/libcurl/competitors.html for a list.)
Rationale:
* HTTP client-side support is pretty complicated. HTTP itself has many corners (httplib.py alone is 1420 lines long, and urllib2 is 1300 lines).
* There are many possible permutations of proxying, SSL on/off, and authentication. We probably haven't tested every permutation, and probably lack the volunteer effort to test them all. If you search for 'http' in the bug tracker, you find about 16 or so bugs submitted for httplib/urllib/urllib2, most of them for one permutation or another.
With a third-party library, the work of maintaining RFC compliance falls to someone else.
* A third-party library might support more features than we have time to implement.
A downside: these libraries would be in C, and might be the source of security bugs. Python code may be buggy, but probably won't fall prey to buffer overflow. We'd also have to keep in sync with the library.
Similar arguments could be made for a server-side solution, but here I have no idea what we might choose. A server-side HTTP implementation + a WSGI gateway might be all that Python 3000 needs.
Good idea? Dumb idea?
--amk

A.M. Kuchling wrote:
Good idea? Dumb idea?
I have no idea about urllib, but httplib certainly should stay, and its bugs should get fixed (and will get fixed, over time).
It is very easy to write a httplib application, since you can follow its source code. It would be much more complicated to use a native library.
Regards, Marttin

On 3/17/06, A.M. Kuchling amk@amk.ca wrote:
Thought: We should drop all of httplib, urllib, urllib2, and ftplib, and instead adopt some third-party library for HTTP/FTP/whatever, write a Python wrapper, and use it instead. (The only such library I know of is libcurl, but doubtless there are other candidates; see http://curl.haxx.se/libcurl/competitors.html for a list.)
Rationale:
HTTP client-side support is pretty complicated. HTTP itself has many corners (httplib.py alone is 1420 lines long, and urllib2 is 1300 lines).
There are many possible permutations of proxying, SSL on/off, and authentication. We probably haven't tested every permutation, and probably lack the volunteer effort to test them all. If you search for 'http' in the bug tracker, you find about 16 or so bugs submitted for httplib/urllib/urllib2, most of them for one permutation or another.
With a third-party library, the work of maintaining RFC compliance falls to someone else.
A third-party library might support more features than we have time to implement.
A downside: these libraries would be in C, and might be the source of security bugs. Python code may be buggy, but probably won't fall prey to buffer overflow. We'd also have to keep in sync with the library.
There is also the issue that PyPy could have problems since they have always preferred we keep pure Python versions of stuff around when possible (I assume IronPython has .NET Internet libraries to use).
Similar arguments could be made for a server-side solution, but here I have no idea what we might choose. A server-side HTTP implementation
- a WSGI gateway might be all that Python 3000 needs.
Good idea? Dumb idea?
Possibly good. We have the precendent of zlib, expat, etc. The key is probably the license is compatible with ours (which libcurl seems to be: MIT/X derivative).
I know that having fixed urllib bugs I sure wouldn't mind if I didn't have to read another RFC on URL formats. =)
But maybe this also poses a larger question of where for Py3K we want to take the stdlib. Ignoring its needed cleanup and nesting of the namespace, do we want to try to use more external tools by importing them and writing a Pythonic wrapper? Or do we want to not do that and try to keep more things under our control and go with the status quo? Or do we want to really prune down the stdlib and use more dynamic downloading ala Cheeseshop and setuptools?
I support the first even though it makes problems for PyPy since it should allow us to have more quality code with less effort on our part. I also support the second since we seem to be able to pull it off. For the third option I would want to be very careful with what is and is not included since Python's "batteries included" solution is an important part of Python and I would not want that to suffer.
-Brett

On Fri, 17 Mar 2006, Brett Cannon wrote:
On 3/17/06, A.M. Kuchling amk@amk.ca wrote:
Thought: We should drop all of httplib, urllib, urllib2, and ftplib, and instead adopt some third-party library for HTTP/FTP/whatever, write a Python wrapper, and use it instead. (The only such library I
[...]
But maybe this also poses a larger question of where for Py3K we want to take the stdlib. Ignoring its needed cleanup and nesting of the namespace, do we want to try to use more external tools by importing them and writing a Pythonic wrapper? Or do we want to not do that and try to keep more things under our control and go with the status quo? Or do we want to really prune down the stdlib and use more dynamic downloading ala Cheeseshop and setuptools?
[...]
Do we have any idea yet what sort of timescale we're talking about for Python 3.0 (or should I call it Py3K still)?
I have a personal interest in these particular modules, but the questions that seem to need answering first are more general ones about the stdlib post-3.0. Brett asks some good questions.
ISTM that another important question must be: What do each of the small set of people like yourself (Brett), Andrew, Martin, Georg, Raymond (etc.!) who bear most of the burden of maintaining the stdlib at present, intend to do after Python 3.0 is out? I assume that it would only be useful to drop parts of the stdlib in this way if that group of people were then to stop working on them. That makes sense, but I don't want to make assumptions about what each of the group of people referred to above intend to do post-3.0:
a. Drop 2.x right away to concentrate on developing and maintaining the 3.0 stdlib (and/or the 3.0 interpreter)?
b. Spend at least some effort maintaining 2.x for a few years?
c. Carry on maintaining 2.x for a few years?
d. Ignore 3.x and continue with 2.x indefinitely?
e. Watch and see how the Python community at large responds to 3.0?
f. Wait and see what you feel like doing at the time?
g. Some combination of the above?
h. Quit Python to take up pig farming?
These sorts of questions are often quite hard to answer, I understand, because many people often want to see what everybody else will do before making up their minds. But I guess people who post here frequently are less likely to do that than are the rest of us sheep ;-)
[BTW, I assume much of the stdlib will remain essentially the same (if not without backwards-incompatibilities), one hopes people will step in to backport 3.0 fixes (and perhaps forward-port: I make no judgement about which of 2.x and 3.x will have the larger user community in the short or long term). People will presumably be more motivated to do that than currently, since I assume many people will not port all (or any) of their code to 3.0.]
John

On 3/19/06, John J Lee jjl@pobox.com wrote:
On Fri, 17 Mar 2006, Brett Cannon wrote:
On 3/17/06, A.M. Kuchling amk@amk.ca wrote:
Thought: We should drop all of httplib, urllib, urllib2, and ftplib, and instead adopt some third-party library for HTTP/FTP/whatever, write a Python wrapper, and use it instead. (The only such library I
[...]
But maybe this also poses a larger question of where for Py3K we want to take the stdlib. Ignoring its needed cleanup and nesting of the namespace, do we want to try to use more external tools by importing them and writing a Pythonic wrapper? Or do we want to not do that and try to keep more things under our control and go with the status quo? Or do we want to really prune down the stdlib and use more dynamic downloading ala Cheeseshop and setuptools?
[...]
Do we have any idea yet what sort of timescale we're talking about for Python 3.0 (or should I call it Py3K still)?
Py3K. It's shorter and since Python 3.0 is still just a PEP and Guido's neurons it really has not materalized yet to be an upcoming version of Python yet. =)
I have a personal interest in these particular modules, but the questions that seem to need answering first are more general ones about the stdlib post-3.0. Brett asks some good questions.
ISTM that another important question must be: What do each of the small set of people like yourself (Brett), Andrew, Martin, Georg, Raymond (etc.!) who bear most of the burden of maintaining the stdlib at present, intend to do after Python 3.0 is out? I assume that it would only be useful to drop parts of the stdlib in this way if that group of people were then to stop working on them. That makes sense, but I don't want to make assumptions about what each of the group of people referred to above intend to do post-3.0:
a. Drop 2.x right away to concentrate on developing and maintaining the 3.0 stdlib (and/or the 3.0 interpreter)?
b. Spend at least some effort maintaining 2.x for a few years?
c. Carry on maintaining 2.x for a few years?
d. Ignore 3.x and continue with 2.x indefinitely?
e. Watch and see how the Python community at large responds to 3.0?
f. Wait and see what you feel like doing at the time?
g. Some combination of the above?
h. Quit Python to take up pig farming?
Py3K will most likely be just another release of Python with a lot of changes. The final 2.x release will be maintained for a while just because we always maintain the last stable release while the next version is being developed. But since the 2.x series will be depended upon by people for quite a while I suspect we will continue to patch it and release it as long as Anthony is willing to do micro releases and developers plan to continue to backport fixes.
Personally, I plan to help to maintain the 2.x series, but once Python 3.0 becomes a reality, it won't be my focus. One would hope that bugs in the 2.x series will get closed up over time and will require less and less maintenance. But backporting might be a problem from 3.x to 2.x because of fundamental differences of how things are structured on top of people just losing interest in 2.x since it isn't bleeding edge.
These sorts of questions are often quite hard to answer, I understand, because many people often want to see what everybody else will do before making up their minds. But I guess people who post here frequently are less likely to do that than are the rest of us sheep ;-)
[BTW, I assume much of the stdlib will remain essentially the same (if not without backwards-incompatibilities), one hopes people will step in to backport 3.0 fixes (and perhaps forward-port: I make no judgement about which of 2.x and 3.x will have the larger user community in the short or long term). People will presumably be more motivated to do that than currently, since I assume many people will not port all (or any) of their code to 3.0.]
Well, I don't know if the stdlib will stay the same. It will definitely get pruned down and cleaned up (wouldn't be shocked if we have a Great Renaming like the C codebase did way back in the day). So I have no clue where the stdlib will go compared to 2.x .
-Brett

John J Lee wrote:
a. Drop 2.x right away to concentrate on developing and maintaining the 3.0 stdlib (and/or the 3.0 interpreter)?
I expect the same to happen as with all previous releases: the current and the previous release (say, 3.0 and 2.5) are maintained; anything older is unmaintained. So when 3.1 is released, 2.x is dead.
Regards, Martin

I was sort of hoping that Python would approach Py3K asymptotically... :-).
PEP 328, for instance, talks about Python 2.5, 2.6, 2.7.
Bill
participants (5)
-
"Martin v. Löwis"
-
A.M. Kuchling
-
Bill Janssen
-
Brett Cannon
-
John J Lee