ptmcg at austin.rr._bogus_.com
Thu Mar 30 16:21:08 CEST 2006
<vvikram at gmail.com> wrote in message
news:1143719899.018571.41330 at u72g2000cwu.googlegroups.com...
> We process a lot of messages in a file based on some regex pattern(s)
> we have in a db.
> If I compile the regex using re.I, the processing time is substantially
> more than if I
> don't i.e using re.I is slow.
> However, more surprisingly, if we do something on the lines of :
> s = <regex string>
> s = s.lower()
> t = dict([(k, '[%s%s]' % (k, k.upper())) for k in
> for k in t: s = s.replace(k, t[k])
> its much better than using plainly re.I.
> So the qns are:
> a) Why is re.I so slow in general?
> b) What is the underlying implementation used and what is wrong, if
> with above method and why is it not used instead?
Can't tell you why re.I is slow, but perhaps this expression will make your
RE transform a little plainer (no need to create that dictionary of uppers
s = <regex string>
makeReAlphaCharLowerOrUpper = lambda c : c.isalpha() and "[%s%s]" %
(c.lower(),c.upper()) or c
s_optimized = "".join( makeReAlphaCharLowerOrUpper(k) for k in s)
s_optimized = "".join( map( makeReAlphaCharLowerOrUpper, s ) )
Just curious, but what happens if your RE contains something like this
spelling check error finder:
(looking for violations of "i before e except after c")
Can 's nest in an RE?
More information about the Python-list