Re: [Tutor] <var:data> assignment

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Tue May 11 17:24:28 EDT 2004



> If you know that you use a particular string often, or need to make it
> faster to i.e. speed up dictionary access with this string as a key, you
> can force Python to intern it. (It's only for strings you can do this.)
>
> >>> s3 = intern("Hello there, how is it out in the dark and cold world?")
> >>> s4 = intern("Hello there, how is it out in the dark and cold world?")
> >>> s3 is s4
> True
>
> 's3' \
>       \   --------------------------------------------------------
>        > | Hello there, how is it out in the dark and cold world? |
>       /   --------------------------------------------------------
> 's4' /
>
> I'm not sure exactly what algorithm Python uses to decide which objects
> to intern automagically.


Hi Magnus,



According to:

    http://www.python.org/doc/lib/non-essential-built-in-funcs.html#l2h-84

the names that are used in Python programs are interned for performance
reasons.


The word "intern" really should apply to strings; I don't think intern()
works on arbitrary objects.  Let's check:

###
>>> intern(100)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: intern() argument 1 must be string, not int
###

Yup, just strings.


As an optimization hack, the integers in the half-open interval range

    [-5, 100)

are created in advance and are kept alive in the Python runtime, so that a
request for a small integer is quickly fulfilled by dipping into this
"small integer" pool.



[For the C programmers here: the relevant performance hack lives in the
Python source code under Objects/intobject.c.  Here's a small snippet:

/******/
#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           100
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif
#if NSMALLNEGINTS + NSMALLPOSINTS > 0
/* References to small integers are saved in this array so that they
   can be shared.
   The integers that are saved are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];
/******/

This is an optimization hack, and the code explicitely shows that Python
can easily do without it: if we set NSMALLPOSINTS and NSMALLNEGINTS both
to zero, and recompile Python, we should see no difference in behavior
(although we'll probably see a drop in performance.)  At least, I think
the C code can handle this situation... *grin*]



> Python never interns mutable objects. Why?

Aliasing reasons.  If strings were mutable, then something like:

### Pseudocode
word1 = intern("hello")
word2 = intern("hello")
word2[1] = 'a'
###

would raise havok: what would we expect word1 to contain, "hello" or
"hallo"?


Interning is a caching technique, and caching objects like strings works
best when we treat object as immutable "value" objects.  But as soon as we
try caching mutable objects, there's a lot of complex aliasing behavior
that might happen.  So Python doesn't provide us an automatic way to do
it.


Hope this helps!




More information about the Tutor mailing list