[Python-Dev] Subclassing varying length types (What's a PyStructSequence ?)

Sun, 2 Dec 2001 05:36:24 -0500

[Martin v. Loewis]
> You should consider that malloc overhead is often 16 bytes per
> object. Given that PyUnicodeObject is 24 bytes in 2.2, system malloc
> will allocate 48 bytes per Unicode object on modern architectures.  I
> would think 100% overhead *is* a big argument.
>
> If you relate this to the actual data, it gets worse: A Unicode string
> of length 1 would still require 32 bytes on an allocator that aligns
> to 16.

I think that's unusual -- 8-byte alignment is most common even on 64-bit
boxes.  KSR had to align to 128-byte boundaries, but there's a reason KSR
died <wink -- alas, gross alignment requirements wasn't really it>.

> Therefore, to store 2 bytes of real data, you need 80 bytes of
> memory.
>
> I don't know how much overhead pymalloc adds, though; I believe it is
> significantly less expensive.

Yes, much less.  On a 32-bit box, using the current #define's, and ignoring
"arena" overhead(*), pymalloc uses 32 bytes per 4096 for bookkeeping.  The
remaining 4064 bytes can all be user data, but subject to 8-byte alignment,
and to how many whole chunks of a given size can fit in 4064 bytes.  For the
PyUnicodeObject example, 8-byte alignment is without cost, and for the rest

>>> divmod(4096 - 32, 24)
(169, 8)
>>>

That is, pymalloc can get 169 PyUnicodeObjects out of a 4KB "page", with 32
bytes for bookkeeping, and 8 bytes left over (unused) -- total overhead is
about 1%.

(*) pymalloc gets "arenas" from the system malloc, where an arena is
currently 256KB.  Up to (worst case) 4KB of that is lost to align the start
address to a 4KB boundary, and there's also the comparatively trivial
(compared to 4KB!) overhead from the system malloc.