[Python-Dev] Re: [Patches] make 'b','h','i' raise overflow exception

Trent Mick trentm@activestate.com
Tue, 9 May 2000 13:05:53 -0700


[Trent]
> > Another option might be to document 'b' as for unsigned chars and 'h', 'i',
> > 'l' as signed integral values and then set the bounds checks ([0,
> > UCHAR_MAX]
> > for 'b')  appropriately. Can we clamp these formatters so? I.e. we would be
> > limiting the user to unsigned or signed depending on the formatter. (Which
> > again, means that it would be nice to have different formatters for signed
> > and unsigned.) I think that the bounds checking is false security unless
> > these restrictions are made.
[guido]
> 
> I like this: 'b' is unsigned, the others are signed.

Okay I will submit a patch for this them. 'b' formatter will limit values to
[0, UCHAR_MAX].

> [Trent]
> > 1. There are a fairly large number of downcasting cases in the
> > Python code (not necessarily tied to PyArg_ParseTuple results). I
> > was wondering if you think a generalized check on each such
> > downcast would be advisable. This would take the form of some macro
> > that would do a bounds check before doing the cast. For example (a
> > common one is the cast of strlen's size_t return value to int,
> > because Python strings use int for their length, this is a downcast
> > on 64-bit systems):
> >
> >   size_t len = strlen(s);
> >   obj = PyString_FromStringAndSize(s, len);
> >
> > would become
> >
> >   size_t len = strlen(s);
> >   obj = PyString_FromStringAndSize(s, CAST_TO_INT(len));
> >
> > CAST_TO_INT would ensure that 'len'did not overflow and would raise an
> > exception otherwise.
> 
> [Guido]
> > How would the CAST_TO_INT macro signal an erro?  C doesn't have
> > exceptions.  If we have to add checks, I'd prefer to write
> >
> >   size_t len = strlen(s);
> >   if (INT_OVERFLOW(len))
> >      return NULL; /* Or whatever is appropriate in this context */
> >   obj = PyString_FromStringAndSize(s, len);
> 
[Tim]
> Of course we have to add checks -- strlen doesn't return an int!  It hasn't
> since about a year after Python was first written (ANSI C changed the rules,
> and Python is long overdue in catching up -- if you want people to stop
> passing multiple args to append, set a good example in our use of C <0.5
> wink>).
>
> The
> check isn't needed anywhere we can prove a priori that it could never fail
> (in which case we're not assuming anything), but it's always needed when we
> can't so prove (in which case skipping the check would be a bad
> asssuption).
> In the absence of any context, your strlen example above definitely needs
> the check.
>

Okay, I just wanted a go ahead that this kind of thing was desired. I will
try to find the points where these overflows *can* happen and then I'll add
checks in a manner closer to Guido syntax above.

> 
> [Trent]
> > I would like other people's opinion on this kind of change.
> > There are three possible answers:
> 
> Please don't change the rating scheme we've been using:  -1 is a veto, +1 is
> a hurrah, -0 and +0 are obvious <ahem>.
> 
> >   +1 this is a bad change idea because...<reason>
> >   -1 this is a good idea, go for it
> 
Whoa, sorry Tim. I mixed up the +/- there. I did not intend to change the
voting system.

[Tim]
> An alternative would be to promote the size member from int to size_t;
> that's no actual change on the 32-bit machines Guido generally assumes
> without realizing it, and removes an arbitrary (albeit defensible)
> limitation on some 64-bit machines at the cost of (just possibly, due to
> alignment vagaries) boosting var objects' header size on the latter.
> 
I agree with Guido that this is too big an immediate change. I'll just try to
find and catch the possible overflows.


Thanks,
Trent

-- 
Trent Mick
trentm@activestate.com