RE: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306)

April 28, 2000

      ...
...
Guido van Rossum wrote:
...
The email below is a serious bug report.  A quick analysis shows that
UserString.count() calls the count() method on a string object, which
calls PyArg_ParseTuple() with the format string "O|ii".  The 'i'
format code truncates integers.  It probably should raise an overflow
exception instead.  But that would still cause the test to fail --
just in a different way (more explicit).  Then the string methods
should be fixed to use long ints instead -- and then something else
would probably break...
MAL wrote:
...
All uses in stringobject.c and unicodeobject.c use INT_MAX
together with integers, so there's no problem on that side
of the fence ;-)
Since strings and Unicode objects use integers to describe the
length of the object (as well as most if not all other
builtin sequence types), the correct default value should
thus be something like sys.maxlen which then gets set to
INT_MAX.
I'd suggest adding sys.maxlen and the modifying UserString.py,
re.py and sre_parse.py accordingly.
Guido wrote:
...
Hm, I'm not so sure.  It would be much better if passing sys.maxint
would just WORK...  Since that's what people have been doing so far.
Possible solutions (I give 4 of them):

1. The 'i' format code could raise an overflow exception and the
PyArg_ParseTuple() call in string_count() could catch it and truncate to
INT_MAX (reasoning that any overflow of the end position of a string can be
bound to INT_MAX because that is the limit for any string in Python).

Pros:
- This "would just WORK" for usage of sys.maxint.

Cons:
-  This overflow exception catching should then reasonably be propagated to
other similar functions (like string.endswith(), etc).
- We have to assume that the exception raised in the PyArg_ParseTuple(args,
"O|ii:count", &subobj, &i, &last) call is for the second integer (i.e.
'last'). This is subtle and ugly.

Pro or Con:
- Do we want to start raising overflow exceptions for other conversion
formats (i.e. 'b' and 'h' and 'l', the latter *can* overflow on Win64 where
sizeof(long) < size(void*))? I think this is a good idea in principle but
may break code (even if it *does* identify bugs in that code).

2. Just change the definitions of the UserString methods to pass a variable
length argument list instead of default value parameters. For example change
UserString.count() from:

    def count(self, sub, start=0, end=sys.maxint):
        return self.data.count(sub, start, end)

to:

    def count(self, *args)):
        return self.data.count(*args)

The result is that the default value for 'end' is now set by string_count()
rather than by the UserString implementation:
...
...
...
from UserString import UserString
s= 'abcabcabc'
u = UserString('abcabcabc')
s.count('abc')
3
u.count('abc')
3
Pros:
- Easy change.
- Fixes the immediate bug.
- This is a safer way to copy the string behaviour in UserString anyway (is
it not?).

Cons:
- Does not fix the general problem of the (common?) usage of sys.maxint to
mean INT_MAX rather than the actual LONG_MAX (this matters on 64-bit
Unices).
- The UserString code is no longer really self-documenting.

3. As MAL suggested: add something like sys.maxlen (set to INT_MAX) with
breaks the logical difference with sys.maxint (set to LONG_MAX):
 - sys.maxint == "the largest value a Python integer can hold"
 - sys.maxlen == "the largest value for the length of an object in Python
(e.g. length of a string, length of an array)"

Pros:
- More explicit in that it separates two distinct meanings for sys.maxint
(which now makes a difference on 64-bit Unices).
- The code changes should be fairly straightforward.

Cons:
- Places in the code that still use sys.maxint where they should use
sys.maxlen will unknowingly
be overflowing ints and bringing about this bug.
- Something else for coders to know about.

4. Add something like sys.maxlen, but set it to SIZET_MAX (c.f. ANSI size_t
type). It is probably not a biggie, but Python currently makes the
assumption that string never exceed INT_MAX in length. While this assumption
is not likely to be proven false it technically could be on 64-bit systems.
As well, when you start compiling on Win64 (where sizeof(int) ==
sizeof(long) < sizeof(size_t)) then you are going to be annoyed by hundreds
of warnings about implicit casts from size_t (64-bits) to int (32-bits) for
every strlen, str*, fwrite, and sizeof call that you make.

Pros:
- IMHO logically more correct.
- Might clean up some subtle bugs.
- Cleans up annoying and disconcerting warnings.
- Will probably mean less pain down the road as 64-bit systems (esp. Win64)
become more prevalent.

Cons:
- Lot of coding changes.
- As Guido said: "and then something else would probably break". (Though, on
currently 32-bits system, there should be no effective change). Only 64-bit
systems should be affected and, I would hope, the effect would be a clean
up.

I apologize for not being succinct. Note that I am volunteering here.
Opinions and guidance please.

Trent

RE: [Python-Dev] issues with int/long on 64bit platforms - eg stringobject (PR#306)

Trent Mick