On Jan 27, 2009, at 11:40 AM, Martin v. Löwis wrote:
Hm. This would change the pickling format though. Wouldn't just interning (short) strings on unpickling be simpler?
Sure - that's what Jake had proposed. However, it is always difficult to select which strings to intern - his heuristics (IIUC) is to intern all strings that appear as dictionary keys. Whether this is good enough, I don't know. In particular, it might intern very large strings that aren't identifiers at all.
I may have misunderstood how unpickling works, but I believe that my path only interns strings that are keys in a dictionary used to populate an instance. This is very similar to how instance creation and modification works in Python now. The only difference is if you set an attribute via "inst.__dict__['attribute_name'] = value" then 'attribute_name' will not be automatically interned, but if you pickle the instance, 'attribute_name' will be interned on unpickling. There may be cases where users specifically go through __dict__ to avoid interning attribute names, but I would be surprised to hear about it and very interested in talking to the person who did that. Creating a new pickle protocol to handle this case seems excessive... -jake