[Python-Dev] Interning filenames of imported modules

Guido van Rossum guido@python.org
Thu, 11 Jan 2001 09:44:58 -0500


> I have a question about the following code in compile.c:jcompile (line 3678)
> 
> 		filename = PyString_InternFromString(sc.c_filename); 
> 		name = PyString_InternFromString(sc.c_name);
> 
> In the case of a long-running server which constantly imports modules,
> this causes the interned string dict to grow without bound.  Is there
> a strong reason that the filename needs to be interned?  How about the
> module name?

It's probably not *necessary* for the filename, but I know why I am
interning it: since a module typically contains a bunch of functions,
and each function has its own code object with a reference to the
filename, I'm trying to save memory (the filename is a C string
pointer in the "sc" structure, so it has to be turned into a Python
string when creating the code object).

The module name is used as an identifier elsewhere so will become
interned anyway.

> How about some way to enforce a limit on the size of the interned
> strings dictionary?

I've never thought of this -- but I suppose that a weak dictionary
could be used.  Fred's working on a PEP for weak references, so
there's a chance that we might use this eventually.

In the mean time, a possibility would be to provide a service function
that goes through the "interned" dictionary and looks for values with
a reference count of 1, and deletes them.  You could then explicitly
call this service function occasionally in your program.  I would let
it return a tuple: (number of values kept, number of values deleted).

--Guido van Rossum (home page: http://www.python.org/~guido/)