It looks like a definite bug. I have *no* idea, tho, why it is doing that... I did quite a bit of testing with chunked replies. Admittedly, though, I didn't stack up requests like you've done in your test. I'm wrapping up mod_dav at the moment, so I don't really have time to look deeply into this. Mebbe next week? Regarding the pipeline request thing. I think it would probably be best to just drop the whole "hold the previous response and wait for it to be closed" thing. I don't know why that is in there; probably a leftover (converted) semantic from the old-style HTTP() class. I'd be quite fine just axing it and allowing the client to shove ten requests down the pipe before pulling the first response back out. Oh. Wait. Maybe that was it. You can't read the "next" response until the first one has been read. Well... no need to block putting new responses; we just need to create a way to "get the next reply" and/or "can I get the next reply yet?" Cheers, -g p.s. Moshe also had a short list of review items. I read thru them, but not with the code in hand to understand some of his specifics. On Wed, 31 May 2000, Jeremy Hylton wrote:
"GS" == Greg Stein <gstein@lyra.org> writes:
GS> [ and recall my email last week that I've updated httplib.py and GS> posted it to my web pages; it is awaiting review for integration GS> into the Python core; it still needs docs and more testing GS> scenarios, tho
I've been looking at the httplib code, and I found what may be a bug. Not sure, because I'm not sure how the API works for pipelined requests.
I've got some test code that looks a bit like this:
def test_new_interface_series(urls): paths = [] the_host = None for url in urls: host, path = get_host_and_path(url) if the_host is None: the_host = host else: assert host == the_host paths.append(path)
conn = httplib.HTTPConnection(the_host) for path in paths: conn.request('GET', path, headers={'User-Agent': 'httplib/Python'}) for path in paths: errcode, errmsg, resp = conn.getreply() buf = resp.read() if errcode == 200: print errcode, resp.headers else: print errcode, `errmsg`, resp print resp.getheader('Content-Length'), len(buf) print repr(buf[:40]) print repr(buf[-40:]) print conn.close()
test_new_interface_series(['http://www.python.org/', 'http://www.python.org/pics/PyBanner054.gif', 'http://www.python.org/pics/PythonHi.gif', 'http://www.python.org/Jobs.html', 'http://www.python.org/doc/', 'http://www.python.org/doc/current/', ])
The second loop that reads the replies gets fouled up after a couple of responses. I added even more debugging and found that the first line of the corrupted response is
'ontent-Type: text/html\015\012'
It looks like some part of the program is consuming too much input. I haven't been able to figure out what part yet. Hoping that you might have some good ideas.
Thinking about this issue, I came up with a potential API problem. You must read the body after calling getreply and before calling getreply a second time. This kind of implicit requirement is a bit tricky. It would help if the implementation could raise an error if this happens. It might be even better if it just worked, although it seems a bit too magical.
Jeremy
-- Greg Stein, http://www.lyra.org/
I found the problem. Sneaky... sock.makefile() does a dup() on the file descriptor, then opens a FILE* with that. See it coming yet? ... FILE* is a buffered thingy. stdio chunked in a block of data on the dup'd file descriptor. When we went to grab another chunk on the *original* descriptor, we missed input [that is now sitting in the FILE* buffer]. Answer: change the .makefile() in getreply() to: file = self.sock.makefile('rb', 0) This problem is going to affect the original httplib, too. IMO, we're about to replace the sucker, so no worries... Cheers, -g On Fri, 2 Jun 2000, Greg Stein wrote:
It looks like a definite bug. I have *no* idea, tho, why it is doing that... I did quite a bit of testing with chunked replies. Admittedly, though, I didn't stack up requests like you've done in your test. I'm wrapping up mod_dav at the moment, so I don't really have time to look deeply into this. Mebbe next week?
Regarding the pipeline request thing. I think it would probably be best to just drop the whole "hold the previous response and wait for it to be closed" thing. I don't know why that is in there; probably a leftover (converted) semantic from the old-style HTTP() class. I'd be quite fine just axing it and allowing the client to shove ten requests down the pipe before pulling the first response back out.
Oh. Wait. Maybe that was it. You can't read the "next" response until the first one has been read. Well... no need to block putting new responses; we just need to create a way to "get the next reply" and/or "can I get the next reply yet?"
Cheers, -g
p.s. Moshe also had a short list of review items. I read thru them, but not with the code in hand to understand some of his specifics.
On Wed, 31 May 2000, Jeremy Hylton wrote:
> "GS" == Greg Stein <gstein@lyra.org> writes:
GS> [ and recall my email last week that I've updated httplib.py and GS> posted it to my web pages; it is awaiting review for integration GS> into the Python core; it still needs docs and more testing GS> scenarios, tho
I've been looking at the httplib code, and I found what may be a bug. Not sure, because I'm not sure how the API works for pipelined requests.
I've got some test code that looks a bit like this:
def test_new_interface_series(urls): paths = [] the_host = None for url in urls: host, path = get_host_and_path(url) if the_host is None: the_host = host else: assert host == the_host paths.append(path)
conn = httplib.HTTPConnection(the_host) for path in paths: conn.request('GET', path, headers={'User-Agent': 'httplib/Python'}) for path in paths: errcode, errmsg, resp = conn.getreply() buf = resp.read() if errcode == 200: print errcode, resp.headers else: print errcode, `errmsg`, resp print resp.getheader('Content-Length'), len(buf) print repr(buf[:40]) print repr(buf[-40:]) print conn.close()
test_new_interface_series(['http://www.python.org/', 'http://www.python.org/pics/PyBanner054.gif', 'http://www.python.org/pics/PythonHi.gif', 'http://www.python.org/Jobs.html', 'http://www.python.org/doc/', 'http://www.python.org/doc/current/', ])
The second loop that reads the replies gets fouled up after a couple of responses. I added even more debugging and found that the first line of the corrupted response is
'ontent-Type: text/html\015\012'
It looks like some part of the program is consuming too much input. I haven't been able to figure out what part yet. Hoping that you might have some good ideas.
Thinking about this issue, I came up with a potential API problem. You must read the body after calling getreply and before calling getreply a second time. This kind of implicit requirement is a bit tricky. It would help if the implementation could raise an error if this happens. It might be even better if it just worked, although it seems a bit too magical.
Jeremy
-- Greg Stein, http://www.lyra.org/
-- Greg Stein, http://www.lyra.org/
On Sat, 3 Jun 2000, Greg Stein wrote:
I found the problem. Sneaky...
sock.makefile() does a dup() on the file descriptor, then opens a FILE* with that. See it coming yet? ...
FILE* is a buffered thingy. stdio chunked in a block of data on the dup'd file descriptor. When we went to grab another chunk on the *original* descriptor, we missed input [that is now sitting in the FILE* buffer].
Answer: change the .makefile() in getreply() to:
file = self.sock.makefile('rb', 0)
This problem is going to affect the original httplib, too. IMO, we're about to replace the sucker, so no worries...
Oh... actually it won't affect the original since that doesn't pipeline requests. -- Greg Stein, http://www.lyra.org/
"GS" == Greg Stein <gstein@lyra.org> writes:
GS> I found the problem. Sneaky... sock.makefile() does a dup() on GS> the file descriptor, then opens a FILE* with that. See it coming GS> yet? ... Bingo! I was suspicious of all these dup'd file descriptors, but missed the connection to a FILE* object. [In a previous message you wrote:]
Regarding the pipeline request thing. I think it would probably be best to just drop the whole "hold the previous response and wait for it to be closed" thing.
[...]
Oh. Wait. Maybe that was it. You can't read the "next" response until the first one has been read. Well... no need to block putting new responses; we just need to create a way to "get the next reply" and/or "can I get the next reply yet?"
Maybe I should clarify the concern I had here. I think we're on the same page, but I'm not sure. The problem with pipelined requests is that the client must be sure to read all of response I before it can call getreply to get response I+1. I imagine that it could add a lot of complexity to use code to implement this requirement, e.g. when multiple threads are sharing a single connection. It would be good if the library could do something reasonable about the multiplexing. In the absence of making it just work, the library could raise an error that makes clear what has gone wrong -- that the client isn't using the interface properly. Jeremy
On Mon, 5 Jun 2000, Jeremy Hylton wrote:
"GS" == Greg Stein <gstein@lyra.org> writes: ... Oh. Wait. Maybe that was it. You can't read the "next" response until the first one has been read. Well... no need to block putting new responses; we just need to create a way to "get the next reply" and/or "can I get the next reply yet?"
Maybe I should clarify the concern I had here. I think we're on the same page, but I'm not sure.
The problem with pipelined requests is that the client must be sure to read all of response I before it can call getreply to get response I+1.
Actually, you can issue a getreply() after you've read the prior response's headers, but before you completely read its body. Once you have the header, then you know whether the connection will remain open or not. Assuming it *will* remain open, then you can go ahead and do a request/reply sequence. If the connection is going to close, then you have to fail at request time.
I imagine that it could add a lot of complexity to use code to implement this requirement, e.g. when multiple threads are sharing a single connection. It would be good if the library could do something reasonable about the multiplexing. In the absence of making it just work, the library could raise an error that makes clear what has gone wrong -- that the client isn't using the interface properly.
I'm working through this stuff right now. It is a bit tricky to get it right *and* have it clear. I'm concentrating on the latter as much as possible. At the moment, HTTPResponse instances can be created without problems. I'm locating the "can you issue a request [and get a response]" logic in the connection object itself. Another detail that I'm trying to work through is where the connection is allowed to get the HTTPResponse to read the HTTP header. Reading off the network could block, so we need to be a bit more careful about what methods can block (if any). In any case, the current httplib (at www.lyra.org/greg/python/) has got just about everything. The next checkin will deal with this pipelining issue. QUESTION #2: I *really* want to get rid of the HTTPS() class. It is introducing excessive complexity, with the only purpose being compatibility against the post-1.5.2 CVS. Anyone? Thoughts on removal? Cheers, -g -- Greg Stein, http://www.lyra.org/
"GS" == Greg Stein <gstein@lyra.org> writes:
GS> QUESTION #2: I *really* want to get rid of the HTTPS() class. It GS> is introducing excessive complexity, with the only purpose being GS> compatibility against the post-1.5.2 CVS. GS> Anyone? Thoughts on removal? I've got two answers. I don't particularly like the old-style HTTP interface, so I'm happy to see it replaced with a better one. I say who cares about HTTPS. On the other hand, there is a lot of existing code that uses the old interface. It would be good if that code could be modified to use the new SSL interface without having to also re-write the code to use the new http interface. Perhaps we should keep it to provide a future upgrade path for all the legacy code. I could probably be convinced that the amount of effort to change from the old interface to the new interface is fairly small. If you're going to make one change to the code anyway, might as well start using the modern interface, too. Is there anyone who actually has http code to maintain that has an opinion? Jeremy
participants (2)
-
Greg Stein
-
Jeremy Hylton