interning strings
Jean Brouwers
mrjean1ATcomcastDOTnet at no.spam.net
Sun Nov 7 21:01:05 EST 2004
A while ago, we faced a similar issue, trying to reduce total memory
usage and runtime of one of our Python applications which parses very
large log files (100+ MB).
One particular class is instantiated many times and changing just that
class to use __slots__ helped quite a bit. More details are here
<http://mail.python.org/pipermail/python-list/2004-May/220513.html>
/Jean Brouwers
ProphICy Semiconductor, Inc.
In article <418eab10$0$13356$afc38c87 at news.optusnet.com.au>, Mike
Thompson wrote:
> [snip very useful explanation]
>
> >
> > By the way, why would you want to mess with these implementation details?
> > Use the == operator to compare strings and be happy ever after :-)
> >
>
> '==' won't help me, I'm afraid.
>
> I need to improve the speed and memory footprint of an application which
> reads in a very large XML document.
>
> Some elements in the incoming documents can be filtered out, so I've
> written my own SAX handler to extract just what I want. All the same,
> the content being read in is substantial.
>
> So, to further reduce memory footprint, my SAX handler tries to manually
> intern (using dicts of strings) a lot of the duplicated content and
> attributes coming from the XML documents. Also, I use the SAX feature
> 'feature_string_interning' to hopefully intern the strings used for
> attribute names etc.
>
> Which is all working fine, except that now, as a final process, I'd like
> to understand interning a bit more.
>
> From your explanation there seems to be no language rules, just
> implementation accidents. And none of those will be particularly
> helpful in my case.
>
> However, I still think I'm going to try using the builtin 'intern'
> rather than my own dict cache. That may provide an advantage, even if it
> doesn't work with unicode.
>
> --
> Mike
More information about the Python-list
mailing list