obmalloc is very nice at allocating small (~224 bytes) memory blocks. But it seems current SMALL_REQUEST_THRESHOLD (512) is too large to me. ```
pool_size = 4096 - 48 # 48 is pool header size for bs in range(16, 513, 16): ... n,r = pool_size//bs, pool_size%bs + 48 ... print(bs, n, r, 100*r/4096) ... 16 253 48 1.171875 32 126 64 1.5625 48 84 64 1.5625 64 63 64 1.5625 80 50 96 2.34375 96 42 64 1.5625 112 36 64 1.5625 128 31 128 3.125 144 28 64 1.5625 160 25 96 2.34375 176 23 48 1.171875 192 21 64 1.5625 208 19 144 3.515625 224 18 64 1.5625 240 16 256 6.25 256 15 256 6.25 272 14 288 7.03125 288 14 64 1.5625 304 13 144 3.515625 320 12 256 6.25 336 12 64 1.5625 352 11 224 5.46875 368 11 48 1.171875 384 10 256 6.25 400 10 96 2.34375 416 9 352 8.59375 432 9 208 5.078125 448 9 64 1.5625 464 8 384 9.375 480 8 256 6.25 496 8 128 3.125 512 7 512 12.5
There are two problems here.
First, pool overhead is at most about 3.5 % until 224 bytes.
But it becomes 6.25% at 240 bytes, 8.6% at 416 bytes, 9.4% at 464
bytes, and 12.5% at 512 bytes.
Second, some size classes have the same number of memory blocks.
Class 272 and 286 have 14 blocks. 320 and 336 have 12 blocks.
It reduces utilization of pools. This problem becomes bigger on 32bit platform.
Increasing pool size is one obvious way to fix these problems.
I think 16KiB pool size and 2MiB (huge page size of x86) arena size is
a sweet spot for recent web servers (typically, about 32 threads, and
64GiB), but there is no evidence about it.
We need a reference application and scenario to benchmark.
pyperformance is not good for measuring memory usage of complex
applications.
header_size = 48 pool_size = 16*1024 for bs in range(16, 513, 16): ... n = (pool_size - header_size) // bs ... r = (pool_size - header_size) % bs + header_size ... print(bs, n, r, 100 * r / pool_size) ... 16 1021 48 0.29296875 32 510 64 0.390625 48 340 64 0.390625 64 255 64 0.390625 80 204 64 0.390625 96 170 64 0.390625 112 145 144 0.87890625 128 127 128 0.78125 144 113 112 0.68359375 160 102 64 0.390625 176 92 192 1.171875 192 85 64 0.390625 208 78 160 0.9765625 224 72 256 1.5625 240 68 64 0.390625 256 63 256 1.5625 272 60 64 0.390625 288 56 256 1.5625 304 53 272 1.66015625 320 51 64 0.390625 336 48 256 1.5625 352 46 192 1.171875 368 44 192 1.171875 384 42 256 1.5625 400 40 384 2.34375 416 39 160 0.9765625 432 37 400 2.44140625 448 36 256 1.5625 464 35 144 0.87890625 480 34 64 0.390625 496 32 512 3.125 512 31 512 3.125
Another way to fix these problems is shrinking SMALL_REQUEST_THRESHOLD
to 256 and believe malloc works well for medium size memory blocks.
--
Inada Naoki <songofacandy@gmail.com>