[Tutor] Python RE uses DFA or NFA for string check?

Tue Jan 10 17:54:57 CET 2006

[Intercodes]
> This question is just out of curiosity. I am working with this dragon book.
> From what I have learnt so far, RE uses either NFA or DFA to check whether
> the string is accepted or not. (Correct?)

In the world of "computer science" regular expressions, yes.  But the
things _called_ "regular expressions" in programming languages are
generally richer than those.  For example, almost all regexp
implementations support backreferences, and backreferences allow
recognizing languages that computer-science regexps cannot.  For
example,

    ^(a*)b+\1$

recognizes strings that begin and end with the same number of a's,
separated by one or more b's.  It's "the same number" part that's
beyond a pure regexp's abilities.

> So what does the Python's RE module use to check the correctness of the
> string, NFA or DFA?

Neither, but it's much closer to NFA than to DFA.  Most regexp
implementations in most languages supporting such a thing are
implemented via backtracking search.  Jeffrey Friedl's "Mastering
Regular Expressions" is more useful than the dragon book if you want
insight into how most programming-language regexp implementations
actually work:

    http://www.oreilly.com/catalog/regex/

To increase confusion ;-), Friedl calls backtracking search "NFA" in that book.