[Python-Dev] Python 3.0 urllib fails with chunked HTTP responses
Jeremy Hylton
jeremy at alum.mit.edu
Thu Dec 18 20:10:28 CET 2008
On Thu, Dec 18, 2008 at 12:27 PM, Guido van Rossum <guido at python.org> wrote:
> It sounds like the self-closing is an implementation detail, meant to
> make sure the socket is closed as early as possible (which I suppose
> is a good thing if there's a server waiting for the final ACK on the
> other side). Perhaps it should not use close() but something slightly
> lower level that affects the socket directly?
That's what I'm thinking, too. I had 10 minutes last night after the
kids went to bed, and my first attempt didn't work :-).
Jeremy
>
> --Guido van Rossum (home page: http://www.python.org/~guido/)
>
>
>
> On Thu, Dec 18, 2008 at 5:22 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>> On Wed, Dec 17, 2008 at 1:05 PM, Guido van Rossum <guido at python.org> wrote:
>>> The inheritance from io.RawIOBase seems fine.
>>
>> There is a small problem with the interaction between HTTPResponse and
>> RawIOBase, but I think the problem is more on the http side. You may
>> recall that the HTTP code has a habit of closing the connection for
>> you. In a variety of cases, once you've read the last bytes of the
>> response, the HTTPResponse object calls its own close() method. This
>> interacts poorly with RawIOBase, because it raises a ValueError for
>> any operation on a closed io object. This prevents iterators from
>> working correctly. The iterator implementation expects the final call
>> to readline() to return an empty string and converts that to a
>> StopIteration. Instead, it's seeing a ValueError that propagates out.
>>
>> It's always been odd to me that the connection closed itself. It's
>> going to be tricky to fix the current bug (chunked responses) and keep
>> the self-closing behavior, but I worry that change the self-closing
>> behavior too dramatically isn't appropriate for a bug fix. Will look
>> some more at this tomorrow.
>>
>> Jeremy
>>
>>> --Guido van Rossum (home page: http://www.python.org/~guido/)
>>>
>>>
>>>
>>> On Mon, Dec 15, 2008 at 11:19 AM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>> I have a patch that appears to fix this bug
>>>> http://bugs.python.org/file12361/urllib-chunked.diff
>>>> but I'm not sure about its interaction with the io module and
>>>> RawIOBase. Is there a new IO expert who could take a look at it for
>>>> me?
>>>>
>>>> Jeremy
>>>>
>>>> On Sun, Dec 14, 2008 at 11:06 PM, Jeremy Hylton <jeremy at alum.mit.edu> wrote:
>>>>> This bug is pretty serious, because urllib will insert garbage into
>>>>> the application-visible data for a chunked response. It simply
>>>>> ignores the fact that it's reading a chunked response and includes the
>>>>> chunked header data is payload data. The original bug was reported in
>>>>> September, but no one noticed it. It was reported again recently.
>>>>>
>>>>> http://bugs.python.org/issue3761
>>>>> http://bugs.python.org/issue4631
>>>>>
>>>>> I suspect we'd want to get a 3.0.1 out as soon as this is fixed, but
>>>>> that's not my call.
>>>>>
>>>>> Jeremy
>>>>>
>>>> _______________________________________________
>>>> Python-Dev mailing list
>>>> Python-Dev at python.org
>>>> http://mail.python.org/mailman/listinfo/python-dev
>>>> Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
>>>>
>>>
>>
>
More information about the Python-Dev
mailing list