[Python-Dev] Python 3.0 urllib fails with chunked HTTP responses

Guido van Rossum guido at python.org
Thu Dec 18 18:27:42 CET 2008

It sounds like the self-closing is an implementation detail, meant to
make sure the socket is closed as early as possible (which I suppose
is a good thing if there's a server waiting for the final ACK on the
other side). Perhaps it should not use close() but something slightly
lower level that affects the socket directly?

--Guido van Rossum (home page: http://www.python.org/~guido/)

On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
> On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:
>> The inheritance from io.RawIOBase seems fine.
> There is a small problem with the interaction between HTTPResponse and
> RawIOBase, but I think the problem is more on the http side.  You may
> recall that the HTTP code has a habit of closing the connection for
> you.  In a variety of cases, once you've read the last bytes of the
> response, the HTTPResponse object calls its own close() method.  This
> interacts poorly with RawIOBase, because it raises a ValueError for
> any operation on a closed io object.  This prevents iterators from
> working correctly.  The iterator implementation expects the final call
> to readline() to return an empty string and converts that to a
> StopIteration.  Instead, it's seeing a ValueError that propagates out.
> It's always been odd to me that the connection closed itself.  It's
> going to be tricky to fix the current bug (chunked responses) and keep
> the self-closing behavior, but I worry that change the self-closing
> behavior too dramatically isn't appropriate for a bug fix.  Will look
> some more at this tomorrow.
> Jeremy
>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>> I have a patch that appears to fix this bug
>>> http://bugs.python.org/file12361/urllib-chunked.diff
>>> but I'm not sure about its interaction with the io module and
>>> RawIOBase.  Is there a new IO expert who could take a look at it for
>>> me?
>>> Jeremy
>>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>> This bug is pretty serious, because urllib will insert garbage into
>>>> the application-visible data for a chunked response.  It simply
>>>> ignores the fact that it's reading a chunked response and includes the
>>>> chunked header data is payload data.  The original bug was reported in
>>>> September, but no one noticed it.  It was reported again recently.
>>>> http://bugs.python.org/issue3761
>>>> http://bugs.python.org/issue4631
>>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>>>> that's not my call.
>>>> Jeremy
>>> _______________________________________________
>>> Python-Dev mailing list
>>> Python-Dev at python.org
>>> http://mail.python.org/mailman/listinfo/python-dev
>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org

More information about the Python-Dev mailing list