[Python-Dev] Bad interaction of __index__ and sequence repeat

Nick Coghlan ncoghlan at iinet.net.au
Sat Jul 29 18:55:13 CEST 2006


Nick Coghlan wrote:
> Armin Rigo wrote:
>> Hi,
>>
>> There is an oversight in the design of __index__() that only just
>> surfaced :-(  It is responsible for the following behavior, on a 32-bit
>> machine with >= 2GB of RAM:
>>
>>     >>> s = 'x' * (2**100)       # works!
>>     >>> len(s)
>>     2147483647
>>
>> This is because PySequence_Repeat(v, w) works by applying w.__index__ in
>> order to call v->sq_repeat.  However, __index__ is defined to clip the
>> result to fit in a Py_ssize_t.  This means that the above problem exists
>> with all sequences, not just strings, given enough RAM to create such
>> sequences with 2147483647 items.
>>
>> For reference, in 2.4 we correctly get an OverflowError.
>>
>> Argh!  What should be done about it?
> 
> I've now got a patch on SF that aims to fix this properly [1].

I revised this patch to further reduce the code duplication associated with 
the indexing code in the standard library.

The patch now has three new functions in the abstract C API:

   PyNumber_Index (used in a dozen or so places)
     - raises IndexError on overflow
   PyNumber_AsSsize_t (used in 3 places)
     - raises OverflowError on overflow
   PyNumber_AsClippedSsize_t() (used once, by _PyEval_SliceIndex)
     - clips to PY_SSIZE_T_MIN/MAX on overflow

All 3 have an int * output argument allowing type errors to be flagged 
directly to the caller rather than through PyErr_Occurred().

Of the 3, only PyNumber_Index is exposed through the operator module.

Probably the most interesting thing now would be for Travis to review it, and 
see whether it makes things easier to handle for the Numeric scalar types 
(given the amount of code the patch deleted from the builtin and standard 
library data types, hopefully the benefits to Numeric will be comparable).

Cheers,
Nick.

[1] http://www.python.org/sf/1530738


-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
---------------------------------------------------------------
             http://www.boredomandlaziness.org


More information about the Python-Dev mailing list