One of the hopes of having a custom allocator for Python was to be able to get rid off all free lists. For some reason that never happened. Not sure why. People were probably too busy with adding new features to the language at the time ;-)
Probably not. It's more that the free lists still outperformed pymalloc.
Something you could try to make PyMalloc perform better for the builtin types is to check the actual size of the allocated PyObjects and then make sure that PyMalloc uses arenas large enough to hold a good quantity of them, e.g. it's possible that the float types fall into the same arena as some other type and thus don't have enough "room" to use as free list.
I don't think any improvements can be gained here. PyMalloc carves out pools of 4096 bytes from an arena when it runs out of blocks for a certain size class, and then keeps a linked list of pools of the same size class. So when many float objects get allocated, you'll have a lot of pools of the float type's size class. IOW, PyMalloc has always enough room. Regards, Martin