[Python-Dev] Python 3.0 urllib fails with chunked HTTP responses

Jeremy Hylton jeremy at alum.mit.edu
Thu Dec 18 20:10:28 CET 2008

On Thu, Dec 18, 2008 at 12:27 PM, Guido van Rossum <guido at python.org> wrote:
> It sounds like the self-closing is an implementation detail, meant to
> make sure the socket is closed as early as possible (which I suppose
> is a good thing if there's a server waiting for the final ACK on the
> other side). Perhaps it should not use close() but something slightly
> lower level that affects the socket directly?

That's what I'm thinking, too.  I had 10 minutes last night after the
kids went to bed, and my first attempt didn't work :-).


> --Guido van Rossum (home page: http://www.python.org/~guido/)
> On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>> On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:
>>> The inheritance from io.RawIOBase seems fine.
>> There is a small problem with the interaction between HTTPResponse and
>> RawIOBase, but I think the problem is more on the http side.  You may
>> recall that the HTTP code has a habit of closing the connection for
>> you.  In a variety of cases, once you've read the last bytes of the
>> response, the HTTPResponse object calls its own close() method.  This
>> interacts poorly with RawIOBase, because it raises a ValueError for
>> any operation on a closed io object.  This prevents iterators from
>> working correctly.  The iterator implementation expects the final call
>> to readline() to return an empty string and converts that to a
>> StopIteration.  Instead, it's seeing a ValueError that propagates out.
>> It's always been odd to me that the connection closed itself.  It's
>> going to be tricky to fix the current bug (chunked responses) and keep
>> the self-closing behavior, but I worry that change the self-closing
>> behavior too dramatically isn't appropriate for a bug fix.  Will look
>> some more at this tomorrow.
>> Jeremy
>>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>> I have a patch that appears to fix this bug
>>>> http://bugs.python.org/file12361/urllib-chunked.diff
>>>> but I'm not sure about its interaction with the io module and
>>>> RawIOBase.  Is there a new IO expert who could take a look at it for
>>>> me?
>>>> Jeremy
>>>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>>> This bug is pretty serious, because urllib will insert garbage into
>>>>> the application-visible data for a chunked response.  It simply
>>>>> ignores the fact that it's reading a chunked response and includes the
>>>>> chunked header data is payload data.  The original bug was reported in
>>>>> September, but no one noticed it.  It was reported again recently.
>>>>> http://bugs.python.org/issue3761
>>>>> http://bugs.python.org/issue4631
>>>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>>>>> that's not my call.
>>>>> Jeremy
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> http://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org

More information about the Python-Dev mailing list