[Python-Dev] Re: [Patches] make 'b','h','i' raise overflow exception
Guido van Rossum
guido@python.org
Mon, 08 May 2000 22:24:50 -0400
[Trent]
> > > Changes the 'b', 'h', and 'i' formatters in PyArg_ParseTuple to raise an
> > > Overflow exception if they overflow (previously they just silently
> > > overflowed).
[Guido]
> > There's one issue with this: I believe the 'b' format is mostly used
> > with unsigned character arguments in practice.
> > However on systems
> > with default signed characters, CHAR_MAX is 127 and values 128-255 are
> > rejected. I'll change the overflow test to:
> >
> > else if (ival > CHAR_MAX && ival >= 256) {
> >
> > if that's okay with you.
[Trent]
> Okay, I guess. Two things:
>
> 1. In a way this defeats the main purpose of the checks. Now a silent overflow
> could happen for a signed byte value over CHAR_MAX. The only way to
> automatically do the bounds checking is if the exact type is known, i.e.
> different formatters for signed and unsigned integral values. I don't know if
> this is desired (is it?). The obvious choice of 'u' prefixes to specify
> unsigned is obviously not an option.
The struct module uses upper case for unsigned. I think this is
overkill here, and would add a lot of code (if applied systematically)
that would rarely be used.
> Another option might be to document 'b' as for unsigned chars and 'h', 'i',
> 'l' as signed integral values and then set the bounds checks ([0, UCHAR_MAX]
> for 'b') appropriately. Can we clamp these formatters so? I.e. we would be
> limiting the user to unsigned or signed depending on the formatter. (Which
> again, means that it would be nice to have different formatters for signed
> and unsigned.) I think that the bounds checking is false security unless
> these restrictions are made.
I like this: 'b' is unsigned, the others are signed.
> 2. The above aside, I would be more inclined to change the line in question to:
>
> else if (ival > UCHAR_MAX) {
>
> as this is more explicit about what is being done.
Agreed.
> > Another issue however is that there are probably cases where an 'i'
> > format is used (which can't overflow on 32-bit architectures) but
> > where the int value is then copied into a short field without an
> > additional check... I'm not sure how to fix this except by a complete
> > inspection of all code... Not clear if it's worth it.
>
> Yes, a complete code inspection seems to be the only way. That is some of
> what I am doing. Again, I have two questions:
>
> 1. There are a fairly large number of downcasting cases in the Python code
> (not necessarily tied to PyArg_ParseTuple results). I was wondering if you
> think a generalized check on each such downcast would be advisable. This
> would take the form of some macro that would do a bounds check before doing
> the cast. For example (a common one is the cast of strlen's size_t return
> value to int, because Python strings use int for their length, this is a
> downcast on 64-bit systems):
>
> size_t len = strlen(s);
> obj = PyString_FromStringAndSize(s, len);
>
> would become
>
> size_t len = strlen(s);
> obj = PyString_FromStringAndSize(s, CAST_TO_INT(len));
>
> CAST_TO_INT would ensure that 'len'did not overflow and would raise an
> exception otherwise.
>
> Pros:
>
> - should never have to worry about overflows again
> - easy to find (given MSC warnings) and easy to code in (staightforward)
>
> Cons:
>
> - more code, more time to execute
> - looks ugly
> - have to check PyErr_Occurred every time a cast is done
How would the CAST_TO_INT macro signal an erro? C doesn't have
exceptions. If we have to add checks, I'd prefer to write
size_t len = strlen(s);
if (INT_OVERFLOW(len))
return NULL; /* Or whatever is appropriate in this context */
obj = PyString_FromStringAndSize(s, len);
> I would like other people's opinion on this kind of change. There are three
> possible answers:
>
> +1 this is a bad change idea because...<reason>
> -1 this is a good idea, go for it
> +0 (mostly likely) This is probably a good idea for some case where the
> overflow *could* happen, however the strlen example that you gave is
> *not* such a situation. As Tim Peters said: 2GB limit on string lengths
> is a good assumption/limitation.
-0
> 2. Microsofts compiler gives good warnings for casts where information loss
> is possible. However, I cannot find a way to get similar warnings from gcc.
> Does anyone know if that is possible. I.e.
>
> int i = 123456;
> short s = i; // should warn about possible loss of information
>
> should give a compiler warning.
Beats me :-(
--Guido van Rossum (home page: http://www.python.org/~guido/)