[Tutor] Python RE uses DFA or NFA for string check?
Tim Peters
tim.peters at gmail.com
Tue Jan 10 17:54:57 CET 2006
[Intercodes]
> This question is just out of curiosity. I am working with this dragon book.
> From what I have learnt so far, RE uses either NFA or DFA to check whether
> the string is accepted or not. (Correct?)
In the world of "computer science" regular expressions, yes. But the
things _called_ "regular expressions" in programming languages are
generally richer than those. For example, almost all regexp
implementations support backreferences, and backreferences allow
recognizing languages that computer-science regexps cannot. For
example,
^(a*)b+\1$
recognizes strings that begin and end with the same number of a's,
separated by one or more b's. It's "the same number" part that's
beyond a pure regexp's abilities.
> So what does the Python's RE module use to check the correctness of the
> string, NFA or DFA?
Neither, but it's much closer to NFA than to DFA. Most regexp
implementations in most languages supporting such a thing are
implemented via backtracking search. Jeffrey Friedl's "Mastering
Regular Expressions" is more useful than the dragon book if you want
insight into how most programming-language regexp implementations
actually work:
http://www.oreilly.com/catalog/regex/
To increase confusion ;-), Friedl calls backtracking search "NFA" in that book.
More information about the Tutor
mailing list