[Python-Dev] Re: [Patches] make 'b','h','i' raise overflow exception

Tim Peters tim_one@email.msn.com
Tue, 9 May 2000 02:54:51 -0400

> 1. There are a fairly large number of downcasting cases in the
> Python code (not necessarily tied to PyArg_ParseTuple results). I
> was wondering if you think a generalized check on each such
> downcast would be advisable. This would take the form of some macro
> that would do a bounds check before doing the cast. For example (a
> common one is the cast of strlen's size_t return value to int,
> because Python strings use int for their length, this is a downcast
> on 64-bit systems):
>   size_t len = strlen(s);
>   obj = PyString_FromStringAndSize(s, len);
> would become
>   size_t len = strlen(s);
>   obj = PyString_FromStringAndSize(s, CAST_TO_INT(len));
> CAST_TO_INT would ensure that 'len'did not overflow and would raise an
> exception otherwise.

> How would the CAST_TO_INT macro signal an erro?  C doesn't have
> exceptions.  If we have to add checks, I'd prefer to write
>   size_t len = strlen(s);
>   if (INT_OVERFLOW(len))
>      return NULL; /* Or whatever is appropriate in this context */
>   obj = PyString_FromStringAndSize(s, len);

Of course we have to add checks -- strlen doesn't return an int!  It hasn't
since about a year after Python was first written (ANSI C changed the rules,
and Python is long overdue in catching up -- if you want people to stop
passing multiple args to append, set a good example in our use of C <0.5

> I would like other people's opinion on this kind of change.
> There are three possible answers:

Please don't change the rating scheme we've been using:  -1 is a veto, +1 is
a hurrah, -0 and +0 are obvious <ahem>.

>   +1 this is a bad change idea because...<reason>
>   -1 this is a good idea, go for it

That one, except spelled +1.

>   +0 (mostly likely) This is probably a good idea for some case
> where the overflow *could* happen, however the strlen example that
> you gave is *not* such a situation. As Tim Peters said: 2GB limit on
> string lengths is a good assumption/limitation.

No, it's a defensible limitation, but it's *never* a valid assumption.  The
check isn't needed anywhere we can prove a priori that it could never fail
(in which case we're not assuming anything), but it's always needed when we
can't so prove (in which case skipping the check would be a bad asssuption).
In the absence of any context, your strlen example above definitely needs
the check.

An alternative would be to promote the size member from int to size_t;
that's no actual change on the 32-bit machines Guido generally assumes
without realizing it, and removes an arbitrary (albeit defensible)
limitation on some 64-bit machines at the cost of (just possibly, due to
alignment vagaries) boosting var objects' header size on the latter.

correctness-doesn't-happen-by-accident-ly y'rs  - tim