[Python-Dev] Bad interaction of __index__ and sequence repeat
Nick Coghlan
ncoghlan at iinet.net.au
Sat Jul 29 18:55:13 CEST 2006
Nick Coghlan wrote:
> Armin Rigo wrote:
>> Hi,
>>
>> There is an oversight in the design of __index__() that only just
>> surfaced :-( It is responsible for the following behavior, on a 32-bit
>> machine with >= 2GB of RAM:
>>
>> >>> s = 'x' * (2**100) # works!
>> >>> len(s)
>> 2147483647
>>
>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>> order to call v->sq_repeat. However, __index__ is defined to clip the
>> result to fit in a Py_ssize_t. This means that the above problem exists
>> with all sequences, not just strings, given enough RAM to create such
>> sequences with 2147483647 items.
>>
>> For reference, in 2.4 we correctly get an OverflowError.
>>
>> Argh! What should be done about it?
>
> I've now got a patch on SF that aims to fix this properly [1].
I revised this patch to further reduce the code duplication associated with
the indexing code in the standard library.
The patch now has three new functions in the abstract C API:
PyNumber_Index (used in a dozen or so places)
- raises IndexError on overflow
PyNumber_AsSsize_t (used in 3 places)
- raises OverflowError on overflow
PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
- clips to PY_SSIZE_T_MIN/MAX on overflow
All 3 have an int * output argument allowing type errors to be flagged
directly to the caller rather than through PyErr_Occurred().
Of the 3, only PyNumber_Index is exposed through the operator module.
Probably the most interesting thing now would be for Travis to review it, and
see whether it makes things easier to handle for the Numeric scalar types
(given the amount of code the patch deleted from the builtin and standard
library data types, hopefully the benefits to Numeric will be comparable).
Cheers,
Nick.
[1] http://www.python.org/sf/1530738
--
Nick Coghlan | ncoghlan at gmail.com | Brisbane, Australia
---------------------------------------------------------------
http://www.boredomandlaziness.org
More information about the Python-Dev
mailing list