Internals of interning strings

Bernhard Herzog herzog at online.de
Sun Mar 26 06:11:27 EST 2000


"Jason Stokes" <jstok at bluedog.apana.org.au> writes:

[discussion of intern() internals. After
	a = "A completely new string that I haven't used before"
	b = a
	c = intern(a)
	d = "A completely new string that I haven't used before"
	e = intern(d)
 Now a's and d's ob_sinterned point to a.
]

> And if "d"
> is hashed, the hash function returns the cached hash value of the object
> currently pointed to by "a".

This seems unnecessary. When a string is interned, it is looked up in a
dictionary and therefore its hash value has to be computed. Hash values
of strings are cached in the string object, so when a string that has
been interned is hashed its hash values is already known without any
further computation and there's no need to look at a for this at all.


There were several replies to your post, but no one has answered your
actual question, as ar as I can tell:

> Anyway, the question is: is this the only reason for the extra
> entry "ob_sinterned" in the PyString struct?  That is, a couple of
> optimisations, costing an extra 4 bytes per string object?

That's something I've also wondered about. How likely is it that one and
the same string object will be interned several times? That seems to be
the only thing that's really optimized.


-- 
Bernhard Herzog   | Sketch, a drawing program for Unix
herzog at online.de  | http://sketch.sourceforge.net/



More information about the Python-list mailing list