Regular expressions vs find?

Matthew Schinckel matt at null.net
Thu Jun 22 18:33:51 EDT 2000


Matthew Schinckel <matt at null.net> wrote:
> I have come across one example where re was much faster than using
> string.find: I was stripping out html tags, and doing a one line
> re.sub('<*?>','') was much faster than using string.find() and slice
> notation.

aahz at netcom.com (Aahz Maruch) wrote:
> Sure! But then you're not really doing just a string.find(). The point
> is that for any operation where string.find() alone can be used, it
> will be faster than an re. Any time you add complexity, such as
> find/replace with a pattern rather than a fixed string, chances are
> good that the re will be faster. -- --- Aahz (Copyright 2000 by
> aahz at netcom.com)

Yeah, I would have been better off doing (in fact I did, once I had
figured it out), a re., but what I was doing was:

1. Find the next occurance of '<'

2. Find the next occurance of '>'

3. Keep everything before 1., and after 2.

4. Repeat until there are no more '<' or '>'.

Obviously, this was stupidly slow, and if there were unmatching signs,
caused a huge memory leak, in one case using up 100+Mb of RAM in several
seconds :-)

Matt.






More information about the Python-list mailing list