Re: [Tutor] <var:data> assignment
Danny Yoo
dyoo at hkn.eecs.berkeley.edu
Tue May 11 17:24:28 EDT 2004
> If you know that you use a particular string often, or need to make it
> faster to i.e. speed up dictionary access with this string as a key, you
> can force Python to intern it. (It's only for strings you can do this.)
>
> >>> s3 = intern("Hello there, how is it out in the dark and cold world?")
> >>> s4 = intern("Hello there, how is it out in the dark and cold world?")
> >>> s3 is s4
> True
>
> 's3' \
> \ --------------------------------------------------------
> > | Hello there, how is it out in the dark and cold world? |
> / --------------------------------------------------------
> 's4' /
>
> I'm not sure exactly what algorithm Python uses to decide which objects
> to intern automagically.
Hi Magnus,
According to:
http://www.python.org/doc/lib/non-essential-built-in-funcs.html#l2h-84
the names that are used in Python programs are interned for performance
reasons.
The word "intern" really should apply to strings; I don't think intern()
works on arbitrary objects. Let's check:
###
>>> intern(100)
Traceback (most recent call last):
File "<stdin>", line 1, in ?
TypeError: intern() argument 1 must be string, not int
###
Yup, just strings.
As an optimization hack, the integers in the half-open interval range
[-5, 100)
are created in advance and are kept alive in the Python runtime, so that a
request for a small integer is quickly fulfilled by dipping into this
"small integer" pool.
[For the C programmers here: the relevant performance hack lives in the
Python source code under Objects/intobject.c. Here's a small snippet:
/******/
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS 100
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS 5
#endif
#if NSMALLNEGINTS + NSMALLPOSINTS > 0
/* References to small integers are saved in this array so that they
can be shared.
The integers that are saved are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
/******/
This is an optimization hack, and the code explicitely shows that Python
can easily do without it: if we set NSMALLPOSINTS and NSMALLNEGINTS both
to zero, and recompile Python, we should see no difference in behavior
(although we'll probably see a drop in performance.) At least, I think
the C code can handle this situation... *grin*]
> Python never interns mutable objects. Why?
Aliasing reasons. If strings were mutable, then something like:
### Pseudocode
word1 = intern("hello")
word2 = intern("hello")
word2[1] = 'a'
###
would raise havok: what would we expect word1 to contain, "hello" or
"hallo"?
Interning is a caching technique, and caching objects like strings works
best when we treat object as immutable "value" objects. But as soon as we
try caching mutable objects, there's a lot of complex aliasing behavior
that might happen. So Python doesn't provide us an automatic way to do
it.
Hope this helps!
More information about the Tutor
mailing list