Regex Speed
John Machin
sjmachin at lexicon.net
Fri Feb 23 19:54:33 EST 2007
On Feb 24, 10:15 am, garri... at gmail.com wrote:
> On Feb 21, 10:34 am, garri... at gmail.com wrote:
>
> > On Feb 20, 6:14 pm, Pop User <popu... at christest2.dc.k12us.com> wrote:
> > >http://swtch.com/~rsc/regexp/regexp1.html
>
> Going back a bit on a tangent, the author of this citation states that
> any regex can be expressed as a DFA machine. However, while
> investigating this more I appear to have found one example of a regex
> which breaks this assumption.
>
> "ab+c|abd"
>
> Am I correct?
No.
> Can you think of a deterministic method of computing
> this expression?
Firstly rewrite a bit:
ab+c|abd
a(b+c|bd)
a(bb*c|bd)
ab(b*c|d)
ab(b+c|c|d)
Here's a DFA that recognises that:
State 0:
a -> 1
State 1:
b -> 2
State 2:
b -> 3 # start of the b+c branch
c -> 4 # the c branch
d -> 4 # the d branch
State 3:
b -> 3
c -> 4
State 4:
accepting state
> It would be easier with a NFA machine,
What is "It"?
> but given that
> the Python method of computing RE's involves pre-compiling a re
> object,
AFAIK all serious regex engines precompile a regex into an internal
representation which can be saved for reuse.
> optimizing the matching engine
What does that mean?
> would make the most sense to
> me.
Compared to what alternatives?
>
> Here's what I have so far:
[big snip]
> >>> a = compile("ab+c|abd")
[snip]
> >>> a.match("abbd")
> Match!
Bzzzt. Neither "ab+c" nor "abd" match a prefix of "abbd".
HTH,
John
More information about the Python-list
mailing list