[Python-Dev] Interning filenames of imported modules
Guido van Rossum
guido@python.org
Thu, 11 Jan 2001 09:44:58 -0500
> I have a question about the following code in compile.c:jcompile (line 3678)
>
> filename = PyString_InternFromString(sc.c_filename);
> name = PyString_InternFromString(sc.c_name);
>
> In the case of a long-running server which constantly imports modules,
> this causes the interned string dict to grow without bound. Is there
> a strong reason that the filename needs to be interned? How about the
> module name?
It's probably not *necessary* for the filename, but I know why I am
interning it: since a module typically contains a bunch of functions,
and each function has its own code object with a reference to the
filename, I'm trying to save memory (the filename is a C string
pointer in the "sc" structure, so it has to be turned into a Python
string when creating the code object).
The module name is used as an identifier elsewhere so will become
interned anyway.
> How about some way to enforce a limit on the size of the interned
> strings dictionary?
I've never thought of this -- but I suppose that a weak dictionary
could be used. Fred's working on a PEP for weak references, so
there's a chance that we might use this eventually.
In the mean time, a possibility would be to provide a service function
that goes through the "interned" dictionary and looks for values with
a reference count of 1, and deletes them. You could then explicitly
call this service function occasionally in your program. I would let
it return a tuple: (number of values kept, number of values deleted).
--Guido van Rossum (home page: http://www.python.org/~guido/)