New subject: pymalloc killer

April 4, 2002

      [me]
...
...
...
In other words, the lastoffset acts as an upper bound watermark.
[Tim]
...
I added a bunch of low-level overviews (arean mgmt, pool 
mgmt, block mgmt) as comment blocks over the last week, and noted
the possibility for this in one of them.  Another possibility is
to get rid of the runtime divisions, and the pool_header capacity
member, by exploiting that
pool->capacity = (POOL_SIZE - POOL_OVERHEAD) / size;
is invariant across all pools with the same szidx; i.e., all 
references to pool->capacity *could* be replaced by
capacity[pool->szidx],  where "capacity" is a new small (at most
32 uints) file-static const vector computed at compile-time.
The comparison in the "try to extend the free list" code is executed very
frequently, so probing a static vector this way will slow down the thing
noticeably. I've tried that. We are very much running after speed here.

[me]
...
...
I didn't want to do that optimization before, because it would have
resulted in a bigger pool header and waste of space. Now it's ok.
[Tim]
...
So now the tradeoff is a little muddier:  If I skipped the 
multiplication optimization, but did the division optimization,
each pool would suddenly gain 8 more bytes (on 32-bit boxes) to use
for blocks.  If I did both, the pool_header size would remain as it
is now, "halfway between" multiple-of-8-byte sizes.  How do you judge
the tradeoffs now (do only one, or do both)?
Do both, despite that the multiplication optimization is more important.
(for one division we have #blocks multiplications for a full pool)

1) I think that adding back one uint slot (lastoffset) is not a problem
   (and we end up with the perfect balance of having a pool header layout
    with 4 pointers and 4 integers <wink>).

2) Now, with the advent of lastoffset, we can easily get rid off the
   division as well. For this, we switch to offset calculus for figuring
   out whether the free list can be extended. This means that 'capacity'
   becomes 'highoffset'. The 'highoffset' is the highwater not to be
   exceeded by more than 'size' bytes by 'lastoffest'.

   a) the test becomes then:  if (pool->lastoffset <= pool->highoffset)
                                  ...
                                  pool->lastoffset += size;
                                  pool->freeblock = (block *)pool + \
                                                    pool->lastoffset;

   b) in the pool_init code:  pool->lastoffset = POOL_OVERHEAD + size;
                              pool->highoffset = POOL_SIZE - size - size;

   and you'll need to adjust the code accordingly at the places where
   you're using 'capacity' per se (and adapt the comments, of course).

This should pretty much settle these murky optimizations for now.

Cheers,
Vladimir

RE: [Python-Dev] pymalloc killer

Marangozov, Vladimir (Vladimir)

Tim Peters

Tim Peters

tags

participants (2)