What to intern (e.g. func_code.co_filename)?

Has anyone come up with rules of thumb for what to intern and what the performance implications of interning are? I'm working on profiling App Engine again, and since they don't allow marshall I have to modify pstats to save the profile via pickle. While trying to get profiles under 1MB, I noticed that each function has its own copy of the filename in which it is defined, and sometimes these strings can be rather long. Creating a code object already interns a bunch of stuff; argument names, variable names, etc. Interning the filename will add some CPU overhead during function creation, should save a decent amount of memory, and ought to have minimal overall performance impact. I have a local patch, but wanted to see if anyone had ideas or experience weighing these tradeoffs. -jake

2010/2/13 Jake McGuire <mcguire@google.com>:
I have a local patch, but wanted to see if anyone had ideas or experience weighing these tradeoffs.
Interning is really only useful because it speeds up dictionary lookups for identifiers. A better idea would be to just attach the same filename object in compiling and unmarshaling. -- Regards, Benjamin

Benjamin Peterson wrote:
2010/2/13 Jake McGuire <mcguire@google.com>:
I have a local patch, but wanted to see if anyone had ideas or experience weighing these tradeoffs.
Interning is really only useful because it speeds up dictionary lookups for identifiers. A better idea would be to just attach the same filename object in compiling and unmarshaling.
I would try to do the sharing during marshaling already. I agree that the file names shouldn't be interned, though, so I propose to create a new code TYPE_SHAREDSTRING, similar to TYPE_INTERNED. It would use the same numbering as TYPE_INTERNED, so backreferences could continue to use TYPE_STRINGREF. Alternatively, a general sharing feature could be added to marshal, sharing all hashable objects. However, before that gets added, I'd like to see statistics how many objects get considered for sharing, and how many back-references then get actually generated. Regards, Martin
participants (3)
-
"Martin v. Löwis"
-
Benjamin Peterson
-
Jake McGuire