[Python-Dev] Bad interaction of __index__ and sequence repeat
Travis Oliphant
oliphant.travis at ieee.org
Mon Jul 31 20:28:09 CEST 2006
Nick Coghlan wrote:
> Nick Coghlan wrote:
>> Armin Rigo wrote:
>>> Hi,
>>>
>>> There is an oversight in the design of __index__() that only just
>>> surfaced :-( It is responsible for the following behavior, on a 32-bit
>>> machine with >= 2GB of RAM:
>>>
>>> >>> s = 'x' * (2**100) # works!
>>> >>> len(s)
>>> 2147483647
>>>
>>> This is because PySequence_Repeat(v, w) works by applying
>>> w.__index__ in
>>> order to call v->sq_repeat. However, __index__ is defined to clip the
>>> result to fit in a Py_ssize_t. This means that the above problem
>>> exists
>>> with all sequences, not just strings, given enough RAM to create such
>>> sequences with 2147483647 items.
>>>
>>> For reference, in 2.4 we correctly get an OverflowError.
>>>
>>> Argh! What should be done about it?
>>
>> I've now got a patch on SF that aims to fix this properly [1].
>
> I revised this patch to further reduce the code duplication associated
> with the indexing code in the standard library.
>
> The patch now has three new functions in the abstract C API:
>
> PyNumber_Index (used in a dozen or so places)
> - raises IndexError on overflow
> PyNumber_AsSsize_t (used in 3 places)
> - raises OverflowError on overflow
> PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
> - clips to PY_SSIZE_T_MIN/MAX on overflow
>
> All 3 have an int * output argument allowing type errors to be flagged
> directly to the caller rather than through PyErr_Occurred().
>
> Of the 3, only PyNumber_Index is exposed through the operator module.
>
> Probably the most interesting thing now would be for Travis to review
> it, and see whether it makes things easier to handle for the Numeric
> scalar types (given the amount of code the patch deleted from the
> builtin and standard library data types, hopefully the benefits to
> Numeric will be comparable).
I noticed most of the checks for PyInt where removed in the patch. If I
remember correctly, I left these in for "optimization." Other than
that, I think the patch is great.
As far as helping with NumPy, I think it will help to be able to remove
special-checks for all the different integer-types. But, this has not
yet been done in the NumPy code.
-Travis
More information about the Python-Dev
mailing list