Expansion of the range of small integers

Now in the CPython small integer numbers from -5 up to 256 inclusive are preallocated at the start. It allows to reduce memory consumption and time of creation of the integers in this range. In particular this affects the speed of short enumerations. Increasing the range to the maximum (from -32767 to 32767 inclusive), we can speed up longer enumerations. Microbenchmarks: ./python -m timeit "for i in range(10000): pass" ./python -m timeit -s "a=[0]*10000" "for i, x in enumerate(a): pass" ./python -m timeit -s "a=[0]*10000" "i=0" "for x in a: i+=1" ./python -m timeit -s "a=[0]*10000" "for i in range(len(a)): x=a[i]" Results: non-patched patched 530 usec 337 usec 57% 1.06 msec 811 usec 31% 1.34 msec 1.13 msec 19% 1.42 msec 1.22 msec 16% Shortcomings: 1) Memory consumption increases by constant 1-1.5 MB. Or half of it if the range is expanded only in a positive direction. This is not a problem on most modern computers. But would be better if the parameters NSMALLPOSINTS and NSMALLNEGINTS have been configurable at build time. 2) A little bit larger Python start time. I was not able to measure the difference, it is too small.

On 17/09/12 22:41, Serhiy Storchaka wrote:
[...]
There is an additional consequence of this proposed change. I'm not sure if this counts as an argument for, or against, the change, but beginners and even some experienced programmers often find the difference between identity and equality hard to deal with. Caching of small integers already blurs the distinction: py> a = 42 py> b = 42 py> a is b True Extending that behaviour up to 32767 will further blur the distinction. -- Steven

On 17.09.12 16:09, Steven D'Aprano wrote:
Extending that behaviour up to 32767 will further blur the distinction.
This is not an argument either for or against. Beginners will always find something to discourage.
"I think it's better to give users the rope they want than to try and prevent them from hanging themselves, since otherwise they'll just use the power cords as ropes and electrocute themselves". (GvR)

On Mon, 17 Sep 2012 15:41:23 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
See also http://bugs.python.org/issue10044
1) Memory consumption increases by constant 1-1.5 MB.
That sounds a bit annoying. Is it for a 32-bit or 64-bit build? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 17.09.12 16:29, Antoine Pitrou wrote:
See also http://bugs.python.org/issue10044
This is interesting. But the main trick causes an undefined behavior.
1) Memory consumption increases by constant 1-1.5 MB.
That sounds a bit annoying. Is it for a 32-bit or 64-bit build?
For a 32-bit it is 14*(2**16-257-5) = 913836 B = 0.87 MiB. For a 64-bit it should be twice as large (1.74 MiB). If you want expand the range to portable maximum.

Oops, sorry, I have included a few extra minor changes in the patch. Here is the correct patch.

On 9/17/2012 8:41 AM, Serhiy Storchaka wrote:
In 2.x before 3.0, the range was about -5 to 10 or so ;-). It was expanded when bytes were added. It might be interesting to instrument the int allocator to count allocations of ints up to say 10000 in real apps.
They are -- by patching as you did ;-). The general philosophy seems to be to discourage user tuning by not making it too easy.
2) A little bit larger Python start time. I was not able to measure the difference, it is too small.
What is hard to guess is the effect on cache hits and misses in real apps. -- Terry Jan Reedy

Am 17.09.2012 19:49, schrieb Terry Reedy:
In 2.x before 3.0, the range was about -5 to 10 or so ;-). It was expanded when bytes were added.
2.2 had -1 to 99. The numbers grew slowly to -5 to 256 in 2.6.
It might be interesting to instrument the int allocator to count allocations of ints up to say 10000 in real apps.
You can easily test how different settings affect the interpreter with some compiler flags: CFLAGS="-DCOUNT_ALLOCS=1 -DNSMALLPOSINTS=10000 -DNSMALLNEGINTS=10000" ./configure && make COUNT_ALLOCS slows down the interpreter a bit. It prints some stats at shutdown.
What is hard to guess is the effect on cache hits and misses in real apps.
Real apps may even save memory when they use lots of ints > 256. Christian

On 9/17/12, Serhiy Storchaka <storchaka@gmail.com> wrote:
On the other hand, you'll add to the memory pressure; instead of putting the small integers and some other important objects in a single page that probably stays loaded, there will be several different pages to page in and out if they are useful, and to waste startup time if they aren't. -jJ

On 17/09/12 22:41, Serhiy Storchaka wrote:
[...]
There is an additional consequence of this proposed change. I'm not sure if this counts as an argument for, or against, the change, but beginners and even some experienced programmers often find the difference between identity and equality hard to deal with. Caching of small integers already blurs the distinction: py> a = 42 py> b = 42 py> a is b True Extending that behaviour up to 32767 will further blur the distinction. -- Steven

On 17.09.12 16:09, Steven D'Aprano wrote:
Extending that behaviour up to 32767 will further blur the distinction.
This is not an argument either for or against. Beginners will always find something to discourage.
"I think it's better to give users the rope they want than to try and prevent them from hanging themselves, since otherwise they'll just use the power cords as ropes and electrocute themselves". (GvR)

On Mon, 17 Sep 2012 15:41:23 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:
See also http://bugs.python.org/issue10044
1) Memory consumption increases by constant 1-1.5 MB.
That sounds a bit annoying. Is it for a 32-bit or 64-bit build? Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On 17.09.12 16:29, Antoine Pitrou wrote:
See also http://bugs.python.org/issue10044
This is interesting. But the main trick causes an undefined behavior.
1) Memory consumption increases by constant 1-1.5 MB.
That sounds a bit annoying. Is it for a 32-bit or 64-bit build?
For a 32-bit it is 14*(2**16-257-5) = 913836 B = 0.87 MiB. For a 64-bit it should be twice as large (1.74 MiB). If you want expand the range to portable maximum.

Oops, sorry, I have included a few extra minor changes in the patch. Here is the correct patch.

On 9/17/2012 8:41 AM, Serhiy Storchaka wrote:
In 2.x before 3.0, the range was about -5 to 10 or so ;-). It was expanded when bytes were added. It might be interesting to instrument the int allocator to count allocations of ints up to say 10000 in real apps.
They are -- by patching as you did ;-). The general philosophy seems to be to discourage user tuning by not making it too easy.
2) A little bit larger Python start time. I was not able to measure the difference, it is too small.
What is hard to guess is the effect on cache hits and misses in real apps. -- Terry Jan Reedy

Am 17.09.2012 19:49, schrieb Terry Reedy:
In 2.x before 3.0, the range was about -5 to 10 or so ;-). It was expanded when bytes were added.
2.2 had -1 to 99. The numbers grew slowly to -5 to 256 in 2.6.
It might be interesting to instrument the int allocator to count allocations of ints up to say 10000 in real apps.
You can easily test how different settings affect the interpreter with some compiler flags: CFLAGS="-DCOUNT_ALLOCS=1 -DNSMALLPOSINTS=10000 -DNSMALLNEGINTS=10000" ./configure && make COUNT_ALLOCS slows down the interpreter a bit. It prints some stats at shutdown.
What is hard to guess is the effect on cache hits and misses in real apps.
Real apps may even save memory when they use lots of ints > 256. Christian

On 9/17/12, Serhiy Storchaka <storchaka@gmail.com> wrote:
On the other hand, you'll add to the memory pressure; instead of putting the small integers and some other important objects in a single page that probably stays loaded, there will be several different pages to page in and out if they are useful, and to waste startup time if they aren't. -jJ
participants (6)
-
Antoine Pitrou
-
Christian Heimes
-
Jim Jewett
-
Serhiy Storchaka
-
Steven D'Aprano
-
Terry Reedy