[Python-Dev] Regular expressions, Unicode etc.

James Y Knight foom at fuhm.net
Fri Aug 10 07:02:16 CEST 2007


On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote:
> Firstly, things like backreferences are an absolute no-no.  They
> are not regular, and REs with them in cannot be converted to DFAs.
> That could be 'solved' by a parser that kicked out such constructions,
> but it would get screams from many users.

People keep saying things like this as if GNU grep and tcl's regular  
expression matchers didn't exist.
See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example.

time python -c 'import re; print re.match("("+"a?"*26+"a"*26+")b\\1",  
"a"*26+"b"+"a"*26).group(0)'
aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa

real    0m5.913s
user    0m5.905s
sys     0m0.006s

time echo 'aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa' |  
grep -E '(a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a? 
aaaaaaaaaaaaaaaaaaaaaaaaaa)b\1'
aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa

real    0m0.002s
user    0m0.002s
sys     0m0.000s

James


More information about the Python-Dev mailing list