[Python-Dev] httplib (was: Adding LDAP to the Python core... ?!)

Greg Stein gstein@lyra.org
Fri, 2 Jun 2000 01:43:13 -0700 (PDT)


It looks like a definite bug. I have *no* idea, tho, why it is doing
that... I did quite a bit of testing with chunked replies. Admittedly,
though, I didn't stack up requests like you've done in your test. I'm
wrapping up mod_dav at the moment, so I don't really have time to look
deeply into this. Mebbe next week?

Regarding the pipeline request thing. I think it would probably be best to
just drop the whole "hold the previous response and wait for it to be
closed" thing. I don't know why that is in there; probably a leftover
(converted) semantic from the old-style HTTP() class. I'd be quite fine
just axing it and allowing the client to shove ten requests down the pipe
before pulling the first response back out.

Oh. Wait. Maybe that was it. You can't read the "next" response until the
first one has been read. Well... no need to block putting new responses;
we just need to create a way to "get the next reply" and/or "can I get the
next reply yet?"

Cheers,
-g

p.s. Moshe also had a short list of review items. I read thru them, but
not with the code in hand to understand some of his specifics.


On Wed, 31 May 2000, Jeremy Hylton wrote:
> >>>>> "GS" == Greg Stein <gstein@lyra.org> writes:
> 
>   GS> [ and recall my email last week that I've updated httplib.py and
>   GS> posted it to my web pages; it is awaiting review for integration
>   GS> into the Python core; it still needs docs and more testing
>   GS> scenarios, tho
> 
> I've been looking at the httplib code, and I found what may be a bug.
> Not sure, because I'm not sure how the API works for pipelined
> requests. 
> 
> I've got some test code that looks a bit like this:
> 
> def test_new_interface_series(urls):
>     paths = []
>     the_host = None
>     for url in urls:
>         host, path = get_host_and_path(url)
>         if the_host is None:
>             the_host = host
>         else:
>             assert host == the_host
>         paths.append(path)
>         
>     conn = httplib.HTTPConnection(the_host)
>     for path in paths:
>         conn.request('GET', path, headers={'User-Agent': 'httplib/Python'})
>     for path in paths:
>         errcode, errmsg, resp = conn.getreply()
>         buf = resp.read()
>         if errcode == 200:
>             print errcode, resp.headers
>         else:
>             print errcode, `errmsg`, resp
>         print resp.getheader('Content-Length'), len(buf)
>         print repr(buf[:40])
>         print repr(buf[-40:])
>         print
>     conn.close()
> 
> test_new_interface_series(['http://www.python.org/',
>                         'http://www.python.org/pics/PyBanner054.gif',
>                         'http://www.python.org/pics/PythonHi.gif',
>                         'http://www.python.org/Jobs.html',
>                         'http://www.python.org/doc/',
>                         'http://www.python.org/doc/current/',
>                            ])
> 
> The second loop that reads the replies gets fouled up after a couple
> of responses.  I added even more debugging and found that the first
> line of the corrupted response is
> 
> > 'ontent-Type: text/html\015\012'
> 
> It looks like some part of the program is consuming too much input.  I
> haven't been able to figure out what part yet.  Hoping that you might
> have some good ideas.
> 
> Thinking about this issue, I came up with a potential API problem.
> You must read the body after calling getreply and before calling
> getreply a second time.  This kind of implicit requirement is a bit
> tricky.  It would help if the implementation could raise an error if
> this happens.  It might be even better if it just worked, although it
> seems a bit too magical.
> 
> Jeremy
> 

-- 
Greg Stein, http://www.lyra.org/