[Python-Dev] String interning
Martin v. Loewis
martin@v.loewis.de
01 Jul 2002 09:03:22 +0200
Oren Tirosh <oren-py-d@hishome.net> writes:
> In stringobject.c most references to ob_sinterned are to initialize it. The
> only place that uses it is string_hash: if ob_sinterned is not NULL it uses
> the hash of the string it points to instead of the current string object.
This is not true: PyString_InternInPlace has
if ((t = s->ob_sinterned) != NULL) {
which checks whether the string being interned had been interned
before.
> Summary: As far as I can tell, indirectly interned strings are redundant.
> Without them the ob_sinterned field is effectively a boolean flag.
>
> Can anyone explain why interning is implemented the way it is? Can anyone
> explain why Mac/Python/macimport.c is messing with ob_sinterned?
I'm not sure what meaning you would assiocate with the boolean
flag. If this is meant to denote "this is an interned string", then
if ((t = s->ob_sinterned) != NULL) {
if (t == (PyObject *)s)
return;
would become
if (s->ob_isinterned) return;
To see the difference, I added
if ((t = s->ob_sinterned) != NULL) {
if (t == (PyObject *)s)
return;
fprintf(stderr, "reinterning\n");
If that code prints "reinterning", it can efficiently intern the
argument, but couldn't with your change.
I agree that this is very rare, but in the test suite, it triggers 5
times in test_descr.
> The size of all string objects can be reduced by 3 bytes.
That is not true. Taking a 32-bit architecture, and considering that
each string has 16 bytes minimum storage (without ob_sinterned), and
taking into account the 8-byte clustering of pymalloc, we get
stringsize current-storage new-storage savings
0 24 24 0
1 24 24 0
2 24 24 0
3 24 24 0
4 32 24 8
5 32 24 8
6 32 24 8
7 32 32 0
So the size reduction depends on the actual length of the strings;
it's 3 bytes only on average, assuming a uniform distribution of
string sizes.
Regards,
Martin