[Python-Dev] Re: httplib (was: Adding LDAP to the Python core... ?!)

Sat, 3 Jun 2000 15:53:18 -0700 (PDT)

I found the problem. Sneaky...

sock.makefile() does a dup() on the file descriptor, then opens a FILE*
with that. See it coming yet? ...

FILE* is a buffered thingy. stdio chunked in a block of data on the dup'd
file descriptor. When we went to grab another chunk on the *original*
descriptor, we missed input [that is now sitting in the FILE* buffer].

Answer: change the .makefile() in getreply() to:

    file = self.sock.makefile('rb', 0)

This problem is going to affect the original httplib, too. IMO, we're
about to replace the sucker, so no worries...

Cheers,
-g

On Fri, 2 Jun 2000, Greg Stein wrote:
> It looks like a definite bug. I have *no* idea, tho, why it is doing
> that... I did quite a bit of testing with chunked replies. Admittedly,
> though, I didn't stack up requests like you've done in your test. I'm
> wrapping up mod_dav at the moment, so I don't really have time to look
> deeply into this. Mebbe next week?
> 
> Regarding the pipeline request thing. I think it would probably be best to
> just drop the whole "hold the previous response and wait for it to be
> closed" thing. I don't know why that is in there; probably a leftover
> (converted) semantic from the old-style HTTP() class. I'd be quite fine
> just axing it and allowing the client to shove ten requests down the pipe
> before pulling the first response back out.
> 
> Oh. Wait. Maybe that was it. You can't read the "next" response until the
> first one has been read. Well... no need to block putting new responses;
> we just need to create a way to "get the next reply" and/or "can I get the
> next reply yet?"
> 
> Cheers,
> -g
> 
> p.s. Moshe also had a short list of review items. I read thru them, but
> not with the code in hand to understand some of his specifics.
> 
> 
> On Wed, 31 May 2000, Jeremy Hylton wrote:
> > >>>>> "GS" == Greg Stein <gstein@lyra.org> writes:
> > 
> >   GS> [ and recall my email last week that I've updated httplib.py and
> >   GS> posted it to my web pages; it is awaiting review for integration
> >   GS> into the Python core; it still needs docs and more testing
> >   GS> scenarios, tho
> > 
> > I've been looking at the httplib code, and I found what may be a bug.
> > Not sure, because I'm not sure how the API works for pipelined
> > requests. 
> > 
> > I've got some test code that looks a bit like this:
> > 
> > def test_new_interface_series(urls):
> >     paths = []
> >     the_host = None
> >     for url in urls:
> >         host, path = get_host_and_path(url)
> >         if the_host is None:
> >             the_host = host
> >         else:
> >             assert host == the_host
> >         paths.append(path)
> >         
> >     conn = httplib.HTTPConnection(the_host)
> >     for path in paths:
> >         conn.request('GET', path, headers={'User-Agent': 'httplib/Python'})
> >     for path in paths:
> >         errcode, errmsg, resp = conn.getreply()
> >         buf = resp.read()
> >         if errcode == 200:
> >             print errcode, resp.headers
> >         else:
> >             print errcode, `errmsg`, resp
> >         print resp.getheader('Content-Length'), len(buf)
> >         print repr(buf[:40])
> >         print repr(buf[-40:])
> >         print
> >     conn.close()
> > 
> > test_new_interface_series(['http://www.python.org/',
> >                         'http://www.python.org/pics/PyBanner054.gif',
> >                         'http://www.python.org/pics/PythonHi.gif',
> >                         'http://www.python.org/Jobs.html',
> >                         'http://www.python.org/doc/',
> >                         'http://www.python.org/doc/current/',
> >                            ])
> > 
> > The second loop that reads the replies gets fouled up after a couple
> > of responses.  I added even more debugging and found that the first
> > line of the corrupted response is
> > 
> > > 'ontent-Type: text/html\015\012'
> > 
> > It looks like some part of the program is consuming too much input.  I
> > haven't been able to figure out what part yet.  Hoping that you might
> > have some good ideas.
> > 
> > Thinking about this issue, I came up with a potential API problem.
> > You must read the body after calling getreply and before calling
> > getreply a second time.  This kind of implicit requirement is a bit
> > tricky.  It would help if the implementation could raise an error if
> > this happens.  It might be even better if it just worked, although it
> > seems a bit too magical.
> > 
> > Jeremy
> > 
> 
> -- 
> Greg Stein, http://www.lyra.org/
> 
> 
> 
> 

-- 
Greg Stein, http://www.lyra.org/