[Python-Dev] int vs ssize_t in unicode

Neal Norwitz nnorwitz at gmail.com
Fri Apr 14 09:10:30 CEST 2006


On 4/13/06, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> Neal Norwitz wrote:
> > I just grepped for INT_MAX and there's a ton of them still (well 83 in
> > */*.c).  Some aren't an issue like posixmodule.c, those are
> > _SC_INT_MAX.  marshal is probably ok, but all uses should be verified.
> >  Really all uses of {INT,LONG}_{MIN,MAX} should be verified and
> > converted to PY_SSIZE_T_{MIN,MAX} as appropriate.
>

BTW, it would be great if someone could try to put together some tests
for bigmem machines.  I'll add it to the todo wiki.  The tests should
be broken up by those that require 2+ GB of memory, those that take
4+, etc.  Many people won't have boxes with that much memory.

The test cases should test all methods (don't forget slicing
operations) at boundary points, particularly just before and after
2GB.  Strings are probably the easiest.  There's unicode too.  lists,
dicts are good but will take more than 16 GB of RAM, so those can be
pushed out some.

I have some machines available for testing.

> I replaced all the trivial ones; the remaining ones are (IMO) more
> involved, or correct. In particular:
>
> - collectionsmodule: deque is still restricted to 2GiB elements
> - cPickle: pickling does not support huge strings (and probably
>   shouldn't); likewise marshal
> - _sre is still limited to INT_MAX completely

I've got outstanding changes somewhere to clean up _sre.

> - not sure why the mbcs codec is restricted to INT_MAX; somebody
>   should check the Win64 API whether the restriction can be
>   removed (most likely, it can)
> - PyObject_CallFunction must be duplicated for PY_SSIZE_T_CLEAN,
>   then exceptions.c can be extended.

My new favorite static analysis tool is grep:

grep '(int)' */*.c | egrep -v 'sizeof(int)' | wc -l
     418

I know a bunch of those aren't problematic, but a bunch are.  Same
with long casts.

n


More information about the Python-Dev mailing list