[Python-Dev] pymalloc killer

Marangozov, Vladimir (Vladimir) vmarangozov@optimay.com
Thu, 4 Apr 2002 10:57:30 +0200


[me]
> >...
> > In other words, the lastoffset acts as an upper bound watermark.

[Tim]
> I added a bunch of low-level overviews (arean mgmt, pool 
> mgmt, block mgmt) as comment blocks over the last week, and noted
> the possibility for this in one of them.  Another possibility is
> to get rid of the runtime divisions, and the pool_header capacity
> member, by exploiting that
> 
> 	pool->capacity = (POOL_SIZE - POOL_OVERHEAD) / size;
> 
> is invariant across all pools with the same szidx; i.e., all 
> references to pool->capacity *could* be replaced by
> capacity[pool->szidx],  where "capacity" is a new small (at most
> 32 uints) file-static const vector computed at compile-time.

The comparison in the "try to extend the free list" code is executed very
frequently, so probing a static vector this way will slow down the thing
noticeably. I've tried that. We are very much running after speed here.

[me]
> > I didn't want to do that optimization before, because it would have
> > resulted in a bigger pool header and waste of space. Now it's ok.

[Tim]
> So now the tradeoff is a little muddier:  If I skipped the 
> multiplication optimization, but did the division optimization,
> each pool would suddenly gain 8 more bytes (on 32-bit boxes) to use
> for blocks.  If I did both, the pool_header size would remain as it
> is now, "halfway between" multiple-of-8-byte sizes.  How do you judge
> the tradeoffs now (do only one, or do both)?

Do both, despite that the multiplication optimization is more important.
(for one division we have #blocks multiplications for a full pool)

1) I think that adding back one uint slot (lastoffset) is not a problem
   (and we end up with the perfect balance of having a pool header layout
    with 4 pointers and 4 integers <wink>).

2) Now, with the advent of lastoffset, we can easily get rid off the
   division as well. For this, we switch to offset calculus for figuring
   out whether the free list can be extended. This means that 'capacity'
   becomes 'highoffset'. The 'highoffset' is the highwater not to be
   exceeded by more than 'size' bytes by 'lastoffest'.

   a) the test becomes then:  if (pool->lastoffset <= pool->highoffset)
                                  ...
                                  pool->lastoffset += size;
                                  pool->freeblock = (block *)pool + \
                                                    pool->lastoffset;

   b) in the pool_init code:  pool->lastoffset = POOL_OVERHEAD + size;
                              pool->highoffset = POOL_SIZE - size - size;

   and you'll need to adjust the code accordingly at the places where
   you're using 'capacity' per se (and adapt the comments, of course).

This should pretty much settle these murky optimizations for now.

Cheers,
Vladimir