[Python-Dev] hey, who broke the array module?

Trent Mick trentm@activestate.com
Mon, 15 May 2000 14:09:58 -0700


I broke it with my patches to test overflow for some of the PyArg_Parse*()
formatting characters. The upshot of testing for overflow is that now those
formatting characters ('b', 'h', 'i', 'l') enforce signed-ness or
unsigned-ness as appropriate (you have to know if the value is signed or
unsigned to know what limits to check against for overflow). Two
possibilities presented themselves:

1. Enforce 'b' as unsigned char (the common usage) and the rest as signed
values (short, int, and long). If you want a signed char, or an unsigned
short you have to work around it yourself.

2. Add formatting characters or modifiers for signed and unsigned versions of
all the integral type to PyArg_Parse*() in getargs.c

Guido prefered the former because (my own interpretation of the reasons) it
covers the common case and keeps the clutter and feature creep down. It is
debatable whether or not we really need signed and unsigned for all of them.
See the following threads on python-dev and patches:
  make 'b' formatter an *unsigned* char
  issues with int/long on 64bit platforms - eg stringobject (PR#306) 
  make 'b','h','i' raise overflow exception
  
Possible code breakage is the drawback.


[Fredrik Lundh wrote]:
> sigh.  never resync the CVS repository until you've fixed all
> bugs in your *own* code ;-)

Sorry, I guess. The test suite did not catch this so it is hard for me to
know that the bug was raised. My patches adds tests for these to the test
suite.

> 
> in 1.5.2:
> 
> >>> array.array("h", [65535])
> array('h', [-1])
> 
> >>> array.array("H", [65535])
> array('H', [65535])
> 
> in the current CVS version:
> 
> >>> array.array("h", [65535])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> OverflowError: signed short integer is greater than maximum
> 
> okay, this might break some existing code -- but one
> can always argue that such code were already broken.

Yes.

> 
> on the other hand:
> 
> >>> array.array("H", [65535])
> Traceback (most recent call last):
>   File "<stdin>", line 1, in ?
> OverflowError: signed short integer is greater than maximum
> 
> oops.
> 
oops. See my patch that fixes this for 'H', and 'b', and 'I', and 'L'.


> dunno if the right thing would be to add support for various kinds
> of unsigned integers to Python/getargs.c, or to hack around this
> in the array module...
> 
My patch does the latter and that would be my suggestion because:
(1) Guido didn't like the idea of adding more formatters to getargs.c (see
above)
(2) Adding support for unsigned and signed versions in getargs.c could be
confusing because the formatting characters cannot be the same as in the
array module because 'L' is already used for LONG_LONG types in
PyArg_Parse*().
(3) KISS and the common case. Keep the number of formatters for
PyArg_Parse*() short and simple. I would presume that the common case user
does not really need the extra support.


Trent


-- 
Trent Mick
trentm@activestate.com