Martin,
In Include/ucnhash.h I notice some integers and wonder if those should be Py_ssize_t. It looks like they are just names so they should be pretty short.
But in Objects/unicodeobject.c, I notice a bunch of ints and casts to int and wonder if they should be changed to Py_ssize_t/removed:
235: assert(length<INT_MAX); unicode->length = (int)length;
376, 404: int i;
1366: (seems like this could be a 64-bit value) int nneeded;
(i stopped at this point, there are probably more)
Modules/unicodedata.c (lots of ints, not sure if they are a problem)
494: isize = PyUnicode_GET_SIZE(input);
n
Neal Norwitz wrote:
In Include/ucnhash.h I notice some integers and wonder if those should be Py_ssize_t. It looks like they are just names so they should be pretty short.
Right: int size is below 100; the name length of a Unicode character is below 40 (I believe). OTOH, changing them to Py_ssize_t wouldn't hurt, either.
235: assert(length<INT_MAX); unicode->length = (int)length;
Right: I just changed it. It may date back to a version of the patch where I only changed the signatures of the functions, but not the object layout.
376, 404: int i;
Right, changed.
1366: (seems like this could be a 64-bit value) int nneeded;
Right; also, the safe downcast can go away.
(i stopped at this point, there are probably more)
I looked through the entire file, and fixed all I could find.
Modules/unicodedata.c (lots of ints, not sure if they are a problem)
Most of them are not, since most functions deal only with Unicode characters, character names, and character numbers (which are below 2**21).
494: isize = PyUnicode_GET_SIZE(input);
This is a problem, though; I fixed it.
Regards, Martin
On 4/12/06, "Martin v. Löwis" martin@v.loewis.de wrote:
235: assert(length<INT_MAX); unicode->length = (int)length;
Right: I just changed it. It may date back to a version of the patch where I only changed the signatures of the functions, but not the object layout.
I just grepped for INT_MAX and there's a ton of them still (well 83 in */*.c). Some aren't an issue like posixmodule.c, those are _SC_INT_MAX. marshal is probably ok, but all uses should be verified. Really all uses of {INT,LONG}_{MIN,MAX} should be verified and converted to PY_SSIZE_T_{MIN,MAX} as appropriate.
There are only a few {INT,LONG}_MIN and 22 LONG_MAX.
I'll try to review these soon unless someone beats me to it.
n
Neal Norwitz wrote:
I just grepped for INT_MAX and there's a ton of them still (well 83 in */*.c). Some aren't an issue like posixmodule.c, those are _SC_INT_MAX. marshal is probably ok, but all uses should be verified. Really all uses of {INT,LONG}_{MIN,MAX} should be verified and converted to PY_SSIZE_T_{MIN,MAX} as appropriate.
I replaced all the trivial ones; the remaining ones are (IMO) more involved, or correct. In particular:
- collectionsmodule: deque is still restricted to 2GiB elements - cPickle: pickling does not support huge strings (and probably shouldn't); likewise marshal - _sre is still limited to INT_MAX completely - not sure why the mbcs codec is restricted to INT_MAX; somebody should check the Win64 API whether the restriction can be removed (most likely, it can) - PyObject_CallFunction must be duplicated for PY_SSIZE_T_CLEAN, then exceptions.c can be extended.
Regards, Martin
On 4/13/06, "Martin v. Löwis" martin@v.loewis.de wrote:
Neal Norwitz wrote:
I just grepped for INT_MAX and there's a ton of them still (well 83 in */*.c). Some aren't an issue like posixmodule.c, those are _SC_INT_MAX. marshal is probably ok, but all uses should be verified. Really all uses of {INT,LONG}_{MIN,MAX} should be verified and converted to PY_SSIZE_T_{MIN,MAX} as appropriate.
BTW, it would be great if someone could try to put together some tests for bigmem machines. I'll add it to the todo wiki. The tests should be broken up by those that require 2+ GB of memory, those that take 4+, etc. Many people won't have boxes with that much memory.
The test cases should test all methods (don't forget slicing operations) at boundary points, particularly just before and after 2GB. Strings are probably the easiest. There's unicode too. lists, dicts are good but will take more than 16 GB of RAM, so those can be pushed out some.
I have some machines available for testing.
I replaced all the trivial ones; the remaining ones are (IMO) more involved, or correct. In particular:
- collectionsmodule: deque is still restricted to 2GiB elements
- cPickle: pickling does not support huge strings (and probably shouldn't); likewise marshal
- _sre is still limited to INT_MAX completely
I've got outstanding changes somewhere to clean up _sre.
- not sure why the mbcs codec is restricted to INT_MAX; somebody should check the Win64 API whether the restriction can be removed (most likely, it can)
- PyObject_CallFunction must be duplicated for PY_SSIZE_T_CLEAN, then exceptions.c can be extended.
My new favorite static analysis tool is grep:
grep '(int)' */*.c | egrep -v 'sizeof(int)' | wc -l 418
I know a bunch of those aren't problematic, but a bunch are. Same with long casts.
n