[Python-Dev] Regular expressions, Unicode etc.
James Y Knight
foom at fuhm.net
Fri Aug 10 07:02:16 CEST 2007
On Aug 8, 2007, at 3:47 PM, Nick Maclaren wrote:
> Firstly, things like backreferences are an absolute no-no. They
> are not regular, and REs with them in cannot be converted to DFAs.
> That could be 'solved' by a parser that kicked out such constructions,
> but it would get screams from many users.
People keep saying things like this as if GNU grep and tcl's regular
expression matchers didn't exist.
See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example.
time python -c 'import re; print re.match("("+"a?"*26+"a"*26+")b\\1",
"a"*26+"b"+"a"*26).group(0)'
aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa
real 0m5.913s
user 0m5.905s
sys 0m0.006s
time echo 'aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa' |
grep -E '(a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?a?
aaaaaaaaaaaaaaaaaaaaaaaaaa)b\1'
aaaaaaaaaaaaaaaaaaaaaaaaaabaaaaaaaaaaaaaaaaaaaaaaaaaa
real 0m0.002s
user 0m0.002s
sys 0m0.000s
James
More information about the Python-Dev
mailing list