[Python-Dev] Python startup time: String objects

Guido van Rossum guido at python.org
Wed Mar 24 22:29:22 EST 2004


> At pycon, I have been looking into Python startup time. 
> 
> I found that CVS-Python allocates roughly 12,000 string objects on
> startup, whereas Python 2.2 only allocates 8,000 string objects. In
> either case, most strings come from unmarshalling string objects,
> and the increase is (probably) due to the increased number of
> modules loaded at startup (up from 26 to 34).

But is this really where the time goes?  On my home box (~11K
pystones/second) I can allocate 12K strings in 17 msec.

> The string objects allocated during unmarshalling are often quickly 
> discarded after being allocated, as they are identifiers, and get
> interned - so only the interned version of the string survives, and
> the second copy is deallocated.
> 
> I'd like to change the marshal format to perform sharing of equal
> strings, instead of marshalling the same identifiers multiple times.
> To do so, a dictionary of strings is created on marshalling and a
> list is created on unmarshalling, and a new marshal code for
> string-backreference would be added.
> 
> What do you think?

Feels like a rather dicey incompatible change to marshal, and rather a
lot of work unless you know it is going to make a significant change.
It seems that marshalling would have to become a two-pass thing,
unless you want to limit that dict/list to function scope, in which
case I'm not sure it'll make much of a difference.

--Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list