[Patches] [Patch #103158] Don't do unsafe arithmetic in xrange object

noreply@sourceforge.net noreply@sourceforge.net
Mon, 15 Jan 2001 09:10:47 -0800


Patch #103158 has been updated. 

Project: python
Category: core (C code)
Status: Open
Submitted by: greg_ball
Assigned to : gvanrossum
Summary: Don't do unsafe arithmetic in xrange object

Follow-Ups:

Date: 2001-Jan-15 09:10
By: greg_ball

Comment:
If we only allow lengths that will fit into a C int,

x = xrange(sys.maxint)

would fail on 64-bit platforms.  (and that code is always going to get
written by people on 32-bit platforms.) Making some sort of special case
for this particular idiom would not make the code simpler, I'm guessing.

One obvious simple way to support it is by truncating the length to INT_MAX
as is done in _PyEval_SliceIndex. 
Then len() and negative indexing can produce consistent
non-garbage results, if not what the user might naively expect.  In Python
2.0, you get garbage.  Using my patch, you get an exception.  But nobody
actually uses those features, in all probability.

Of course the natural extension of this approach is to allow any python
integer arguments at all to xrange() (which Tim doesn't want to do) and
truncate as necessary.  But there is no need to go there; we can just leave
bltin_xrange alone.

As an aside, how important for performance is the use of C ints rather than
C longs in the Python internals on 64 bit machines?  16 bit machines?

-------------------------------------------------------

Date: 2001-Jan-12 12:11
By: greg_ball

Comment:
The new version of the patch makes the code a bit less complicated.  (Only
error messages have changed in terms of behaviour; regression test run
under linux and OSF1.)  If it looks ok you can ignore the rest of this
comment...

I originally made totlen an int because it is the return value from
range_length, and the argument to  PyList_New().  The current patch makes
totlen a long, but the ocde doesn't get hugely less complicated.

The code would get simpler if it rejected objects
whose length doesn't fit into a int, but that seemed too easy and there is
a small chance someone's code might break. On the other hand by rejecting
objects whose stop
can't be correctly calculated, I _might_ have broken code already.  Break
just means "now raises exception" though, nothing insidious.


-------------------------------------------------------

Date: 2001-Jan-12 12:58
By: tim_one

Comment:
I'm aware of people who actually do

x = xrange(sys.maxint)

and that should continue to work.  Python 2.0 already complains about more
than that:

>>> xrange(-1, sys.maxint)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange() has more than sys.maxint items
>>>

and I sure don't want to see that extended (unless it's part of a
*sweeping* crusade to eradicate the visibile differences between ints and
longs).

-------------------------------------------------------

Date: 2001-Jan-12 12:44
By: gvanrossum

Comment:
In theory, there's a use for objects whose length doesn't fit in an int:
for i in xrange(...) is routinely used for loops that expect to break well
before the end but don't want to give an arbitrary upper bound.

But I agree with Tim that this serves no real purpose.

I think that if anything, the xrange() object should be scaled down.  No
repetition, no concationation, no slicing.  the only real use ought to be
"for i in xrange(...)".  But of course, *that* is bound to break somebody's
code somewhere.  So let's not go there.

But let's not complicate it any further.  Prevent breakage, but don't
stretch the functionality beyond what it currently does.

-------------------------------------------------------

Date: 2001-Jan-12 12:35
By: tim_one

Comment:
Please do reject objects whose length doesn't fit into an int.  I'll bet a
dollar that no real code in the world will break as a result.  I feel
confident making that bet, because if someone's code *does* break as a
result, it's worth a dollar to me to see Guido lecture them that their code
was already broken <wink>.

Really, it's very Pythonic to gripe about a problem just as soon as it
crops up.  It's not doing anyone a favor to delay griping until the object
is used:  at that point, it may be very difficult to figure out how the
object *got* damaged.  There's simply no payback in allowing creation of an
unusable xrange object, and, as you say, life is simpler if you don't allow
it.

-------------------------------------------------------

Date: 2001-Jan-10 10:28
By: gvanrossum

Comment:
I wonder why you made totlen an int.  This seems to complicate the code
quite a bit.  I'd suggest making it a long.  Then you have to test whether
it's > INT_MAX only in tolist() -- in that case you should raise
MemoryError.

Other than that, it looks decent!

-------------------------------------------------------

Date: 2001-Jan-09 06:31
By: greg_ball

Comment:
This is a replacement for patch 103129,
as suggested by tim_one.

Possibly interesting aspects of the patch:

- xrange objects with overflowing lengths are not gratutiously rejected. 
An error is raised only if the length becomes relevant, under len() or
negative indexing.  Bad lengths are stored as -1. Objects with bad lengths
allow any non-negative integer index.

- On 64 bit platforms, sequence multiplication seems to cause an unsafe
cast from long to int.  This patch does not address that issue.

Without my patch, on i686 Linux:
Python 2.0 (#25, Jan  2 2001, 12:18:07)
[GCC 2.95.3 19991030 (prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.
>>> from sys import maxint
>>> x = xrange(maxint) * 2
>>> len(x)
-2
>>> x[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
SystemError: error return without exception set
>>> y = xrange(maxint/2)*3
>>> len(y)
-1073741827
>>> y[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
SystemError: error return without exception set                            
   
>>> y.tolist()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
SystemError: listobject.c:33: bad argument to internal function
>>> m = maxint >> 16
>>> xrange(0,maxint,m)
xrange(0, -2147450883, 32767)
>>> (xrange(1)*maxint)*maxint
xrange(1)
>>>  

With my patch
Python 2.0 (#35, Jan  6 2001, 12:54:41)
[GCC 2.95.3 19991030 (prerelease)] on linux2
Type "copyright", "credits" or "license" for more information.             
   
>>> from sys import maxint
>>> x = xrange(maxint) * 2
>>> len(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange object too long
>>> x[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange object too long
>>> y = xrange(maxint/2)*3
>>> len(y)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange object too long
>>> y.tolist()
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange object too long
>>> m = maxint >> 16
>>> xrange(0,maxint,m)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: integer addition
>>> (xrange(1)*maxint) * maxint
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: integer multiplication
>>>  

Without patch on Digital Unix
Python 2.0 (#5, Jan  6 2001, 15:42:58) [C] on osf1V4
Type "copyright", "credits" or "license" for more information.
>>> from sys import maxint
>>> m=maxint >> 32
>>> x = xrange(maxint)*2
>>> x
(xrange(9223372036854775807) * 2)
>>> len(x)
-2
>>> x[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
SystemError: error return without exception set
>>> xrange(0,maxint,m)
xrange(0, -9223372034707292163, 2147483647)
>>> xrange(1)*maxint
xrange(0)
>>> 

With patch
Python 2.0 (#7, Jan  6 2001, 15:52:21) [C] on osf1V4
Type "copyright", "credits" or "license" for more information.
>>> from sys import maxint
>>> m=maxint >> 32
>>> x = xrange(maxint)*2
>>> x
(xrange(9223372036854775807) * 2)
>>> len(x)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange object too long
>>> x[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: xrange object too long
>>> xrange(0,maxint,m)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
OverflowError: integer addition
>>> xrange(1)*maxint
xrange(0)
>>> 

-------------------------------------------------------

-------------------------------------------------------
For more info, visit:

http://sourceforge.net/patch/?func=detailpatch&patch_id=103158&group_id=5470