[Python-Dev] Bad interaction of __index__ and sequence repeat
Travis Oliphant
oliphant.travis at ieee.org
Fri Jul 28 18:15:51 CEST 2006
Nick Coghlan wrote:
> David Hopwood wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-( It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100) # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying
>>> w.__index__ in
>>> order to call v->sq_repeat. However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t.
>>
>> Clipping the result sounds like it would *never* be a good idea. What
>> was
>> the rationale for that? It should throw an exception.
>
> A simple demonstration of the clipping behaviour that works on
> machines with limited memory:
>
> >>> (2**100).__index__()
> 2147483647
> >>> (-2**100).__index__()
> -2147483648
>
> PEP 357 doesn't even mention the issue, and the comment on long_index
> in the code doesn't give a rationale - it just notes that the function
> clips the result.
I can't think of the rationale so it was probably an unclear one and
should be thought of as a bug. The fact that it isn't discussed in the
PEP means it wasn't thought about clearly. I think I had the vague idea
that .__index_() should always succeed. But, this shows a problem with
that notion.
>
> I'm inclined to call it a bug, too, but I've cc'ed Travis to see if he
> can shed some light on the question - the implementation of long_index
> explicitly suppresses the overflow error generated by
> _long_as_ssize_t, so the current behaviour appears to be deliberate.
If it was deliberate, it was a hurried decision and one that should be
re-thought and probably changed. I think the idea came from the fact
that out-of-bounds slicing returns empty lists and since __index__ was
primarily developed to allow integer-like objects to be used in slicing
it adopted that behavior. In fact it looks like the comment above
_long_index contains words from the comment above _PyEval_SliceIndex
showing the direct borrowing of the idea. But, _long_index is clearly
the wrong place to handle the situation since it is used by more than
just the slicing code. An error return is already handled by the
_Eval_SliceIndex code anyway.
I say it's a bug that should be fixed. Don't clear the error, raise it.
-Travis
More information about the Python-Dev
mailing list