[Python-Dev] Activating pymalloc

Tim Peters tim.one@comcast.net
Sat, 16 Mar 2002 18:57:13 -0500


[martin@v.loewis.de]
> I also wonder whether it might be sensible to space the size classes
> 16 bytes apart, thus reducing the number of classes to 16.

[Tim asks why, and back to Martin]

> Reducing the overhead memory allocated from the system. With 64 pools,
> and each pool having 4k, on average, the allocator will waste 128k.

Ah.  Vladimir learned to steer clear of "simplifying assumptions" early on,
and his

    http://starship.python.net/crew/vlad/pymalloc/

pages are still a masterpiece of trying to extract signal from noise.
Mostly he left behind lots of noise reports and left the signal extraction
to the reader <wink>.

> ...
> If you have a size class of 256 bytes, you will waste 224 bytes at the
> end of each page, since 32 bytes of pool_header go into the beginning
> of each page, and the last 224 bytes won't satisfy a 256 byte request.
> So on a 4k page, the size class 256 currently wastes, on average, 224
> + 15 * 4 = 284 bytes (assuming that the average object wastes 4 bytes),
> or 7%. So it may be worthwhile to increase the pool size.

I primarily care about speed, so I'm not inclined to spend much time
worrying about worst-case 6.25% overhead:

>>> for n in range(8, 257, 8):
...     leftover = (4096 - 32) % n
...     percent = (leftover + 32) / 40.96
...     print "%3d %3d %5.2f" % (n, leftover, percent)
...
  8   0  0.78
 16   0  0.78
 24   8  0.98
 32   0  0.78
 40  24  1.37
 48  32  1.56
 56  32  1.56
 64  32  1.56
 72  32  1.56
 80  64  2.34
 88  16  1.17
 96  32  1.56
104   8  0.98
112  32  1.56
120 104  3.32
128  96  3.13
136 120  3.71
144  32  1.56
152 112  3.52
160  64  2.34
168  32  1.56
176  16  1.17
184  16  1.17
192  32  1.56
200  64  2.34
208 112  3.52
216 176  5.08
224  32  1.56
232 120  3.71
240 224  6.25
248  96  3.13
256 224  6.25

Most of these are tiny as-is, and setting the cutoff at 144 would remove the
three worst cases.

Note that I don't *object* to thinking about fiddling bucket sizes and steps
and number of classes, it's just that cutting already-small overhead
percentages isn't a priority to me.