[Guido]
Maybe we should just drop indirect interning then. It can save 31 bits per string object, right? How to collect those savings?
[Tim]
Make the flag a byte insted of a pointer and it will save 3 or 7 bytes (depending on native pointer size) "on average". Note, assuming a 32-bit box: since pymalloc 8-byte aligns, the smallest footprint a string object can have now is 24 bytes, 20 of which are consumed by bookkeeping overheads (type pointer, refcount, ob_size, ob_shash, ob_sinterned). Strings through length 3 fit in this size (one byte is needed for the trailing \0 we always put in ob_sval[]). Saving 3 bytes wouldn't actually change the memory burden of the smallest string object, but would allow all strings of lengths 4, 5 and 6 to consume 8 fewer bytes than at present (assuming compilers are happy not to pad between a char member and char[] member). That's probably a significant savings for many string-slinging apps (count the number of words of lengths 4, 5 and 6 in this msg (even <wink> benefits <wink>)).
This means a change in the string object lay-out, which breaks binary compatibility (the PyString_AS_STRING macro depends on this). I don't mind biting this bullet, but it means we have to increment the API version, and perhaps the warning about API version mismatches should become an error if an extension with too an API version before this change is detected. Oren, how's that patch coming along? :-) --Guido van Rossum (home page: http://www.python.org/~guido/)