[Python-Dev] The memo of pickle

Guido van Rossum guido@python.org
Fri, 09 Aug 2002 14:03:10 -0400


> [Guido]
> > Maybe we should just drop indirect interning then.  It can save 31
> > bits per string object, right?  How to collect those savings?

[Tim]
> Make the flag a byte insted of a pointer and it will save 3 or 7
> bytes (depending on native pointer size) "on average".  Note,
> assuming a 32-bit box: since pymalloc 8-byte aligns, the smallest
> footprint a string object can have now is 24 bytes, 20 of which are
> consumed by bookkeeping overheads (type pointer, refcount, ob_size,
> ob_shash, ob_sinterned).  Strings through length 3 fit in this size
> (one byte is needed for the trailing \0 we always put in ob_sval[]).
> Saving 3 bytes wouldn't actually change the memory burden of the
> smallest string object, but would allow all strings of lengths 4, 5
> and 6 to consume 8 fewer bytes than at present (assuming compilers
> are happy not to pad between a char member and char[] member).
> That's probably a significant savings for many string-slinging apps
> (count the number of words of lengths 4, 5 and 6 in this msg (even
> <wink> benefits <wink>)).

This means a change in the string object lay-out, which breaks binary
compatibility (the PyString_AS_STRING macro depends on this).

I don't mind biting this bullet, but it means we have to increment the
API version, and perhaps the warning about API version mismatches
should become an error if an extension with too an API version before
this change is detected.

Oren, how's that patch coming along? :-)

--Guido van Rossum (home page: http://www.python.org/~guido/)