re documentation error

Heiko Wundram heikowu at ceosg.de
Mon Sep 17 14:11:48 EDT 2001


On Monday 17 September 2001 18:30, you wrote:
> looks like a bug in the new (2.0) engine:

Actually, to me it looks like 1.5.2's engine had a bug! ;))

> [snip 1.5.2 output]

> >>> import sre # 2.0's regular expression engine
> >>> p = sre.compile("x*")
> >>> p.sub("-", "abxd")

Look what it does (in my oppinion that is correct behaviour).

It starts by trying to mach x* at pos 0:
nothing machtes x* -> so insert - in output

get next char from input. We now have "-a"

Now matches x* against pos 1:
nothing matches x* -> so insert -

get next chat from input. We now have "-a-b"

Now matches x* against pos 2:
matches x -> so replace with -

get no char from input, as there was a match. We now have "-a-b-"

Now comes the crucial point:

Match x* against pos 3:
nothing matches x* -> so insert -

get next char from input. we now have "-a-b--d"

etc.

And that way we arrive that the output that was specified. What the above 
pseudocode does is move one ahead if one character or none matched in the 
input, otherwise move ahead as many as the match had. And I guess you've 
implemented something quite similar...

I don't think it always makes sense to have a different behaviour, because 
sre.sub used in this fashion is actually quite an interesting way to split 
apart letters in a string and insert letters between them. Just use one 
letter that doesn't appear in the string, and you're off (might be slow 
though...)

Well, I actually think the sre's behaviour is useful. Why not keep it at 
that? Any anyway, people are discouraged to use * that way, but rather + 
(which doesn't produce this kind of "strange behaviour"...)

Just my two cents on this topic.

-- 
Yours sincerely,

	Heiko Wundram




More information about the Python-list mailing list