[Tutor] <var:data> assignment [Python under the hood: optimizations at the C level]

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Tue May 11 21:59:12 EDT 2004



On Wed, 12 May 2004, Magnus Lycka wrote:

> I wrote:
> > > If you know that you use a particular string often, or need to make it
> > > faster to i.e. speed up dictionary access with this string as a key, you
> > > can force Python to intern it. (It's only for strings you can do this.)
> ..
> > > I'm not sure exactly what algorithm Python uses to decide which objects
> > > to intern automagically.
>
> Danny responded:
> > The word "intern" really should apply to strings; I don't think intern()
> > works on arbitrary objects.
>
> I just wrote that!


Hi Magnus,


My apologies! I read your message too quickly, and skipped over the part
where you mentioned that it worked on strings only.


I have a bad habit of tunnel vision --- good when debugging code, but not
so good when communicating with people.  *grin* I will try to be a better
listener next time.




> As I look further, it seems that it has nothing to do with size as I
> thought, but that all strings that are valid Python identifiers are
> interned.
>
> The manual says that "Normally, the names used in Python programs are
> automatically interned, and the dictionaries used to hold module, class
> or instance attributes have interned keys."
>
> It's obviously more extensive than that. All string *literals* that
> could possibly be names used in Python programs seems to get interned.


Yes, it's done at bytecode-compile time.  In Python/compile.c, there's a
step that interns all variable names and literals that are "name"-like
characters:


/******/
PyCodeObject *
PyCode_New(int argcount, int nlocals, int stacksize, int flags,
           PyObject *code, PyObject *consts, PyObject *names,
           PyObject *varnames, PyObject *freevars, PyObject *cellvars,
           PyObject *filename, PyObject *name, int firstlineno,
           PyObject *lnotab)
{

[some code cut]

        intern_strings(names);
        intern_strings(varnames);
        intern_strings(freevars);
        intern_strings(cellvars);
        /* Intern selected string constants */
        for (i = PyTuple_Size(consts); --i >= 0; ) {
                PyObject *v = PyTuple_GetItem(consts, i);
                if (!PyString_Check(v))
                        continue;
                if (!all_name_chars((unsigned char
*)PyString_AS_STRING(v)))
                        continue;
                PyString_InternInPlace(&PyTuple_GET_ITEM(consts, i));
        }

/******/

(Code taken from Python 2.3.3 C source)


Again, this is an C optimization hack that isn't documented: it's not
documented because we really shouldn't depend on this behavior!  *grin*


In fact, I have no idea what Jython does.  Let's check it:

###
[dyoo at tesuque dyoo]$ jython
Jython 2.1 on java1.4.1_01 (JIT: null)
Type "copyright", "credits" or "license" for more information.
>>> id("hello world")
17064560
>>> id("hello world")
22629283
>>> id("hello world")
11354272
###


Ah.  Yup, it does something different in Jython.  Hence, it's really an
implemention detail that we really shouldn't be looking at.  But I get the
feeling we've completely strayed off the original topic anyway.  *grin*




More information about the Tutor mailing list