Mon Nov 8 00:09:02 CET 2004
[snip very useful explanation]
> By the way, why would you want to mess with these implementation details?
> Use the == operator to compare strings and be happy ever after :-)
'==' won't help me, I'm afraid.
I need to improve the speed and memory footprint of an application which
reads in a very large XML document.
Some elements in the incoming documents can be filtered out, so I've
written my own SAX handler to extract just what I want. All the same,
the content being read in is substantial.
So, to further reduce memory footprint, my SAX handler tries to manually
intern (using dicts of strings) a lot of the duplicated content and
attributes coming from the XML documents. Also, I use the SAX feature
'feature_string_interning' to hopefully intern the strings used for
attribute names etc.
Which is all working fine, except that now, as a final process, I'd like
to understand interning a bit more.
From your explanation there seems to be no language rules, just
implementation accidents. And none of those will be particularly
helpful in my case.
However, I still think I'm going to try using the builtin 'intern'
rather than my own dict cache. That may provide an advantage, even if it
doesn't work with unicode.
More information about the Python-list