[Python-Dev] Python startup time: String objects

Martin v. Löwis martin at v.loewis.de
Wed Mar 24 15:19:22 EST 2004


At pycon, I have been looking into Python startup time. 

I found that CVS-Python allocates roughly 12,000 string objects on startup,
whereas Python 2.2 only allocates 8,000 string objects. In either case,
most strings come from unmarshalling string objects, and the increase is
(probably) due to the increased number of modules loaded at startup
(up from 26 to 34).

The string objects allocated during unmarshalling are often quickly 
discarded after being allocated, as they are identifiers, and get
interned - so only the interned version of the string survives, and
the second copy is deallocated.

I'd like to change the marshal format to perform sharing of equal
strings, instead of marshalling the same identifiers multiple times.
To do so, a dictionary of strings is created on marshalling and a
list is created on unmarshalling, and a new marshal code for
string-backreference would be added.

What do you think?

Regards,
Martin



More information about the Python-Dev mailing list