[Python-Dev] Interning string subtype instances
Josiah Carlson
jcarlson at uci.edu
Thu Feb 15 05:18:57 CET 2007
Greg Ewing <greg.ewing at canterbury.ac.nz> wrote:
>
> Josiah Carlson wrote:
>
> > Assuming that dictionaries and the hash algorithm for strings is not
> > hopelessly broken, I believe that one discovers quite quickly when two
> > strings are not equal.
>
> You're probably right, since if there's a hash collision,
> the colliding strings probably differ in the first char
> or two.
>
> Interning will speed up the case where they are equal,
> where otherwise you would have to examine every character.
Except that the scanning of the data needs to happen at some point
anyways. You first allocate the new string, copy the data into it,
compare the content against something in the interned dictionary, and if
there is a match, deallocate the new object and return the interned
object, otherwise return the new object.
This is also the case when one uses the "just give me a string, I'll
fill in the data myself" for more or less user-implemented things like
''.join(lst).
> Still, there could be extension modules out there
> somewhere that make use of PyString_CHECK_INTERNED,
> so there's a slight chance that home-grown interning
> might not give the same results in all cases.
Perhaps. I just never intern. I've found that I'm much happier
assuming that the Python runtime will do good enough.
- Josiah
More information about the Python-Dev
mailing list