[Python-Dev] Regular expressions, Unicode etc.
Nick Maclaren
nmm1 at cus.cam.ac.uk
Fri Aug 10 10:28:58 CEST 2007
James Y Knight <foom at fuhm.net> wrote:
>
> > Firstly, things like backreferences are an absolute no-no. They
> > are not regular, and REs with them in cannot be converted to DFAs.
> > That could be 'solved' by a parser that kicked out such constructions,
> > but it would get screams from many users.
>
> People keep saying things like this as if GNU grep and tcl's regular
> expression matchers didn't exist.
> See http://www.tcl.tk/man/tcl8.5/TclCmd/re_syntax.htm for example.
PCRE also has a breadth-first engine, but it does not convert the
NFA to a DFA (its author is a close colleague of mine). Those
engines won't do the conversion, either, and I am prepared to bet
that I could produce a pattern that would either run very slowly
or expose the semantics differences in most of them.
I did NOT say that there were not, alternative, approaches. What
I said was correct - you cannot convert such extended expressions
to DFAs. You can convert them to things that are sort of NFA/DFA
hybrids, which might or might not be a good way to proceed.
Regards,
Nick Maclaren,
University of Cambridge Computing Service,
New Museums Site, Pembroke Street, Cambridge CB2 3QH, England.
Email: nmm1 at cam.ac.uk
Tel.: +44 1223 334761 Fax: +44 1223 334679
More information about the Python-Dev
mailing list