[Python-Dev] Re: [Patches] make 'b','h','i' raise overflow exception

Mon, 08 May 2000 22:24:50 -0400

[Trent]
> > > Changes the 'b', 'h', and 'i' formatters in PyArg_ParseTuple to raise an
> > > Overflow exception if they overflow (previously they just silently
> > > overflowed).

[Guido]
> > There's one issue with this: I believe the 'b' format is mostly used
> > with unsigned character arguments in practice.
> > However on systems
> > with default signed characters, CHAR_MAX is 127 and values 128-255 are
> > rejected.  I'll change the overflow test to:
> > 
> > 	else if (ival > CHAR_MAX && ival >= 256) {
> > 
> > if that's okay with you.

[Trent]
> Okay, I guess. Two things:
> 
> 1. In a way this defeats the main purpose of the checks. Now a silent overflow
> could happen for a signed byte value over CHAR_MAX. The only way to
> automatically do the bounds checking is if the exact type is known, i.e.
> different formatters for signed and unsigned integral values. I don't know if
> this is desired (is it?). The obvious choice of 'u' prefixes to specify
> unsigned is obviously not an option.

The struct module uses upper case for unsigned.  I think this is
overkill here, and would add a lot of code (if applied systematically)
that would rarely be used.

> Another option might be to document 'b' as for unsigned chars and 'h', 'i',
> 'l' as signed integral values and then set the bounds checks ([0, UCHAR_MAX]
> for 'b')  appropriately. Can we clamp these formatters so? I.e. we would be
> limiting the user to unsigned or signed depending on the formatter. (Which
> again, means that it would be nice to have different formatters for signed
> and unsigned.) I think that the bounds checking is false security unless
> these restrictions are made.

I like this: 'b' is unsigned, the others are signed.

> 2. The above aside, I would be more inclined to change the line in question to:
> 
>    else if (ival > UCHAR_MAX) {
> 
> as this is more explicit about what is being done.

Agreed.

> > Another issue however is that there are probably cases where an 'i'
> > format is used (which can't overflow on 32-bit architectures) but
> > where the int value is then copied into a short field without an
> > additional check...  I'm not sure how to fix this except by a complete
> > inspection of all code...  Not clear if it's worth it.
> 
> Yes, a complete code inspection seems to be the only way. That is some of
> what I am doing. Again, I have two questions:
> 
> 1. There are a fairly large number of downcasting cases in the Python code
> (not necessarily tied to PyArg_ParseTuple results). I was wondering if you
> think a generalized check on each such downcast would be advisable. This
> would take the form of some macro that would do a bounds check before doing
> the cast. For example (a common one is the cast of strlen's size_t return
> value to int, because Python strings use int for their length, this is a
> downcast on 64-bit systems):
> 
>   size_t len = strlen(s);
>   obj = PyString_FromStringAndSize(s, len);
> 
> would become
>   
>   size_t len = strlen(s);
>   obj = PyString_FromStringAndSize(s, CAST_TO_INT(len));
> 
> CAST_TO_INT would ensure that 'len'did not overflow and would raise an
> exception otherwise.
> 
> Pros:
> 
> - should never have to worry about overflows again
> - easy to find (given MSC warnings) and easy to code in (staightforward)
> 
> Cons:
> 
> - more code, more time to execute
> - looks ugly
> - have to check PyErr_Occurred every time a cast is done

How would the CAST_TO_INT macro signal an erro?  C doesn't have
exceptions.  If we have to add checks, I'd prefer to write

  size_t len = strlen(s);
  if (INT_OVERFLOW(len))
     return NULL; /* Or whatever is appropriate in this context */
  obj = PyString_FromStringAndSize(s, len);

> I would like other people's opinion on this kind of change. There are three
> possible answers:
> 
>   +1 this is a bad change idea because...<reason>
>   -1 this is a good idea, go for it
>   +0 (mostly likely) This is probably a good idea for some case where the
>      overflow *could* happen, however the strlen example that you gave is
> 	 *not* such a situation. As Tim Peters said: 2GB limit on string lengths
> 	 is a good assumption/limitation.

-0

> 2. Microsofts compiler gives good warnings for casts where information loss
> is possible. However, I cannot find a way to get similar warnings from gcc.
> Does anyone know if that is possible. I.e.
> 
> 	int i = 123456;
> 	short s = i;  // should warn about possible loss of information
> 
> should give a compiler warning.

Beats me :-(

--Guido van Rossum (home page: http://www.python.org/~guido/)