RE: [Python-Dev] Subclassing varying length types (What's a PyStructSequence ?)

Dec. 2, 2001

      [Martin v. Loewis]
...
You should consider that malloc overhead is often 16 bytes per
object. Given that PyUnicodeObject is 24 bytes in 2.2, system malloc
will allocate 48 bytes per Unicode object on modern architectures.  I
would think 100% overhead *is* a big argument.
If you relate this to the actual data, it gets worse: A Unicode string
of length 1 would still require 32 bytes on an allocator that aligns
to 16.
I think that's unusual -- 8-byte alignment is most common even on 64-bit
boxes.  KSR had to align to 128-byte boundaries, but there's a reason KSR
died <wink -- alas, gross alignment requirements wasn't really it>.
...
Therefore, to store 2 bytes of real data, you need 80 bytes of
memory.
I don't know how much overhead pymalloc adds, though; I believe it is
significantly less expensive.
Yes, much less.  On a 32-bit box, using the current #define's, and ignoring
"arena" overhead(*), pymalloc uses 32 bytes per 4096 for bookkeeping.  The
remaining 4064 bytes can all be user data, but subject to 8-byte alignment,
and to how many whole chunks of a given size can fit in 4064 bytes.  For the
PyUnicodeObject example, 8-byte alignment is without cost, and for the rest
...
...
...
divmod(4096 - 32, 24)
(169, 8)
That is, pymalloc can get 169 PyUnicodeObjects out of a 4KB "page", with 32
bytes for bookkeeping, and 8 bytes left over (unused) -- total overhead is
about 1%.

(*) pymalloc gets "arenas" from the system malloc, where an arena is
currently 256KB.  Up to (worst case) 4KB of that is lost to align the start
address to a 4KB boundary, and there's also the comparatively trivial
(compared to 4KB!) overhead from the system malloc.

RE: [Python-Dev] Subclassing varying length types (What's a PyStructSequence ?)

Tim Peters