
[Martin v. Loewis]
You should consider that malloc overhead is often 16 bytes per object. Given that PyUnicodeObject is 24 bytes in 2.2, system malloc will allocate 48 bytes per Unicode object on modern architectures. I would think 100% overhead *is* a big argument.
If you relate this to the actual data, it gets worse: A Unicode string of length 1 would still require 32 bytes on an allocator that aligns to 16.
I think that's unusual -- 8-byte alignment is most common even on 64-bit boxes. KSR had to align to 128-byte boundaries, but there's a reason KSR died <wink -- alas, gross alignment requirements wasn't really it>.
Therefore, to store 2 bytes of real data, you need 80 bytes of memory.
I don't know how much overhead pymalloc adds, though; I believe it is significantly less expensive.
Yes, much less. On a 32-bit box, using the current #define's, and ignoring "arena" overhead(*), pymalloc uses 32 bytes per 4096 for bookkeeping. The remaining 4064 bytes can all be user data, but subject to 8-byte alignment, and to how many whole chunks of a given size can fit in 4064 bytes. For the PyUnicodeObject example, 8-byte alignment is without cost, and for the rest
divmod(4096 - 32, 24) (169, 8)
That is, pymalloc can get 169 PyUnicodeObjects out of a 4KB "page", with 32 bytes for bookkeeping, and 8 bytes left over (unused) -- total overhead is about 1%. (*) pymalloc gets "arenas" from the system malloc, where an arena is currently 256KB. Up to (worst case) 4KB of that is lost to align the start address to a 4KB boundary, and there's also the comparatively trivial (compared to 4KB!) overhead from the system malloc.