[Twisted-Python] HTTP client should be more tolerant?
As initial disclaimer, PLEASE change your FAQ about bug reporting - nobody really wants to correctly report bug by first joining on some random mailing list (OR allow moderated non-member postings, but we all know how much spam THAT will drag in). Anyway.. According to the HTTP RFCs, CRLF is the valid line termination sequence. However, the appendix (section 19.3 in old HTTP/1.1, too lazy to dig up more recent one) states as follows: The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR. Some UNIXish servers, while breaking RFC, seem to work on browsers. I'd say following this advice would be good, and therefore tuned my twisted http client to be 'tolerant' (all major and minor browsers that I am aware of are). Changes required? web/http.py: @@ -314,6 +314,7 @@ length = None firstLine = 1 __buffer = '' + delimiter = '\n' def sendCommand(self, command, path): self.transport.write('%s %s HTTP/1.0\r\n' % (command, path)) @@ -325,6 +326,7 @@ self.transport.write('\r\n') def lineReceived(self, line): + if line and line[-1] == '\r': line = line[:-1] if self.firstLine: self.firstLine = 0 l = line.split(None, 2) I'm also considering implementing HTTP/1.1 client. Has anyone done anything about it? Thoughts? 'Give up, bad idea'? I also rewrote the proxy for my local use a bit, mostly to make it more component-oriented AND correct (currently HTTP/1.1 clients do not get persistent service, which is inefficient). Unfortunately changes are mixed among some other code, but has there been any work on proxy recently? I might have interest in submitting patch or two to make it more correct in some things. -Markus -- A: "You can do wonderful things in software if you get rid of the assumption that Windows == Shit." B: "But likewise you could do wonderful things in hardware if you got rid of the assumption that Gravity == 9.81 metres per second squared." C: "But that's an invalid comparison. You can get rid of the gravity problem quite easily by a change of location -- but do you think Windows would get any better if you shot it to the moon?" -- rec.humor.funny post by ermel@gmx.de (Erik Meltzer)
On Feb 3, 2005, at 10:36 AM, Markus Stenberg wrote:
According to the HTTP RFCs, CRLF is the valid line termination sequence. However, the appendix (section 19.3 in old HTTP/1.1, too lazy to dig up more recent one) states as follows:
The line terminator for message-header fields is the sequence CRLF. However, we recommend that applications, when parsing such headers, recognize a single LF as a line terminator and ignore the leading CR.
Some UNIXish servers, while breaking RFC, seem to work on browsers. I'd say following this advice would be good, and therefore tuned my twisted http client to be 'tolerant' (all major and minor browsers that I am aware of are).
OMG. *What* evil server does that?
I'm also considering implementing HTTP/1.1 client. Has anyone done anything about it? Thoughts? 'Give up, bad idea'?
Thought about it, but still working on the new HTTP server. (see twisted.web2 in svn trunk). Hopefully a new client could reuse much of the work for the new server (esp. header parsing/generation).
I also rewrote the proxy for my local use a bit, mostly to make it more component-oriented AND correct (currently HTTP/1.1 clients do not get persistent service, which is inefficient). Unfortunately changes are mixed among some other code, but has there been any work on proxy recently? I might have interest in submitting patch or two to make it more correct in some things.
A proxy should only be a small glue layer between the server and client, so once those two are working well, a correctly operating (non-caching) proxy should be pretty easy. A caching proxy, of course, is a whole other project. James
James Y Knight <foom@fuhm.net> writes:
On Feb 3, 2005, at 10:36 AM, Markus Stenberg wrote: <snip>
Some UNIXish servers, while breaking RFC, seem to work on browsers. I'd say following this advice would be good, and therefore tuned my twisted http client to be 'tolerant' (all major and minor browsers that I am aware of are). OMG. *What* evil server does that?
Offhand I can remember only some old ones, i.e. they have been fixed since (some 'minimal web servers' in assorted languages were guilty of this, mostly in the HTTP/1.0 and before days). But many, many CGI scripts are broken that way still (because if you code on UNIX, and dump headers using say, perl's print, you don't naturally use \r\n unless you really know the standard, and unfortunately many CGI authors don't seem to. And web servers usually dump the results out directly => corruption spreads.
I'm also considering implementing HTTP/1.1 client. Has anyone done anything about it? Thoughts? 'Give up, bad idea'? Thought about it, but still working on the new HTTP server. (see twisted.web2 in svn trunk). Hopefully a new client could reuse much of the work for the new server (esp. header parsing/generation).
One could imagine that given correct header parsing/generation and request parsing code (chunked de/encoding for example needs to be implemented in both ends), it would be trivial. I'll take a look ;)
I also rewrote the proxy for my local use a bit, mostly to make it more component-oriented AND correct (currently HTTP/1.1 clients do not get persistent service, which is inefficient). Unfortunately changes are mixed among some other code, but has there been any work on proxy recently? I might have interest in submitting patch or two to make it more correct in some things. A proxy should only be a small glue layer between the server and client, so once those two are working well, a correctly operating (non-caching) proxy should be pretty easy. A caching proxy, of course, is a whole other project.
True. The tricky part is making it modular so that it can be extended without say, subclassing whole set of classes (as is needed to change the current twisted.web.proxy). Obviously, less clean hack of mutating the static variables of classes that point to subclasses works to limited degree too, but I avoid stuff like that.
James
-Markus
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Markus Stenberg wrote: | As initial disclaimer, PLEASE change your FAQ about bug reporting - nobody | really wants to correctly report bug by first joining on some random | mailing list (OR allow moderated non-member postings, but we all know how | much spam THAT will drag in). It may be that nobody wants to, but apparently most people are willing to. Bug reports go to the mailing list first in case they are really misunderstandings, which is in fact the case much of the time. Having absolutely everything go to the bug tracker would simply clutter the tracker and make it useless. This is not an unusual practice for a large-scale open source project. That said, the change you propose probably could have gone into the tracker as a patch. We don't have any mechanism to prevent you from doing so. You just have to use your best judgement. C -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (MingW32) Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org iD8DBQFCAk8h3A5SrXAiHQcRApbsAJ4pe1HsXzSD3O/2uTcG7pNj0y2JFACfbLPy DacboLJj4QFDguqzVKTV+vs= =Gzj/ -----END PGP SIGNATURE-----
On Thu, 2005-02-03 at 08:19 -0800, Cory Dodt wrote:
Bug reports go to the mailing list first in case they are really misunderstandings, which is in fact the case much of the time. Having absolutely everything go to the bug tracker would simply clutter the tracker and make it useless.
I disagree. The FAQ in 2.0 (subversion trunk) says bugs should go to tracker, and mailing list in urgent cases (with reference to bug # in tracker). Bugs should never go only to the mailing list, as they can get easily get lost. Of course, a clarification or a question is not something that should go in the tracker, but anything that has a patch attached pretty much ought to go in the tracker.
On Thu, Feb 03, 2005 at 05:36:19PM +0200, Markus Stenberg wrote:
As initial disclaimer, PLEASE change your FAQ about bug reporting - nobody really wants to correctly report bug by first joining on some random mailing list (OR allow moderated non-member postings, but we all know how much spam THAT will drag in).
It doesn't seem to be common knowledge, but Mailman allows you to subscribe to a list, then set your subscription as "no mail", so that you are allowed to post (because you're a subscriber), but you don't get a flood of mailing list traffic in your inbox. (But yes, the FAQ in SVN already says to use the tracker at http://twistedmatrix.com/bugs/) -Andrew.
Markus Stenberg wrote:
Changes required? web/http.py:
@@ -314,6 +314,7 @@ length = None firstLine = 1 __buffer = '' + delimiter = '\n'
def sendCommand(self, command, path): self.transport.write('%s %s HTTP/1.0\r\n' % (command, path))
That affects self.sendLine() too, causing your modified version to potentially send LF-only lines. Not what you wanted.
Tommi Virtanen <tv@twistedmatrix.com> writes:
Changes required? web/http.py: @@ -314,6 +314,7 @@ length = None firstLine = 1 __buffer = '' + delimiter = '\n' def sendCommand(self, command, path): self.transport.write('%s %s HTTP/1.0\r\n' % (command, path)) That affects self.sendLine() too, causing your modified version to
Markus Stenberg wrote: potentially send LF-only lines. Not what you wanted.
At least in 1.3 the HTTP stuff doesn't use sendLine for some ineffable reason. Probably not in the old stuff in svn either, but don't have checked out copy at home ;) wonder why sendCommand/Header, endHeaders are implemented using transport.write.. -Markus
Markus Stenberg wrote:
That affects self.sendLine() too, causing your modified version to potentially send LF-only lines. Not what you wanted. At least in 1.3 the HTTP stuff doesn't use sendLine for some ineffable reason. Probably not in the old stuff in svn either, but don't have checked out copy at home ;) wonder why sendCommand/Header, endHeaders are implemented using transport.write..
Not nice to rely on that -- I'd call your patch a time bomb, waiting to explode. The idea is nice.
participants (6)
-
Andrew Bennetts
-
Cory Dodt
-
Itamar Shtull-Trauring
-
James Y Knight
-
Markus Stenberg
-
Tommi Virtanen