String interning in Python 3 - missing or moved?

Chris Angelico rosuav at gmail.com
Mon Jan 23 23:47:56 EST 2012


On Tue, Jan 24, 2012 at 3:18 PM, Terry Reedy <tjreedy at udel.edu> wrote:
> I think that the devs decided that interning is a minor internal
> optimization that users generally should not fiddle with (especially how
> that so much is done automatically anyway*), while having it a builtin made
> it look like something they should pay attention to.
>
> *I am not sure but what hashes for strings either are or in 3.3 will always
> be cached.

I'm of the opinion that hash() shouldn't be relied upon, but
apparently there's code "out there" that would be broken if hash()
changed (and, quite reasonably, the devs don't want to make a sudden
change as a bug-fix release). String interning basically turns every
string into a completely opaque hash; you can use 'is' to test for
equality of two interned strings. Having intern() as a builtin cannot
encourage any worse behavior than relying on hash(), imho - both make
no promises of constancy across runs.

Lua and Pike both quite happily solved hash collision attacks in their
interning of strings by randomizing the hash used, because there's no
way to rely on it. Presumably (based on the intern() docs) Python can
do the same, if you explicitly intern your strings first. Is it worth
recommending that people do this with anything that is
client-provided, and then simply randomize the intern() hash? This
would allow hash() to be unchanged, intern() to still do exactly what
it's always done, and hash collision attacks to be eliminated.

ChrisA



More information about the Python-list mailing list