[Neal Norwitz]
It's not just size. Architectures may require data aligned on 4, 8, or 16 addresses for optimal performance depending on data type. IIRC, malloc aligns by 8 (not sure if that was a particular arch or very common).
Just very common. Because malloc has no idea what the pointer it returns will be used for, it needs to satisfy the strictest alignment requirement for any type exposed by the C language. As an extreme example, when I worked at Kendall Square Research, our hardware supported atomic locking on a "subpage" basis, and HW subpage addresses were all those divisible by 128 Subpages were exposed via C extensions, so the KSR malloc had to return 128-byte aligned pointers.
I don't know if pymalloc handles alignment.
pymalloc ensures 8-byte alignment. This is one plausible reason to keep the current int free list: an int object struct holds 3 4-byte members on most boxes (type pointer, refcount, and the int's value), and the int freelist code uses exactly 12 bytes for each on most boxes. To keep 8-byte alignment, pymalloc would have to hand out a 16-byte chunk per int object, wasting a fourth of the space (pymalloc always rounds up a requested size to a multiple of 8, and ensures the address returned is 8-byte aligned).