Regexp syntax change in 1.6?

Gareth McCaughan Gareth.McCaughan at pobox.com
Fri Sep 8 18:03:56 EDT 2000


Adam Sampson wrote:

> Under Python 1.5.2, I had a script containing the following line:
> 
> m = re.match(r"[a-z0-9]*://[^/]+/.*\.([^.#\?/]*)([#\?]?.*)?", url)
> 
> (Bonus points for guessing what it does; answer down the bottom.)
> Under 1.6, this fails with:
..
> sre_constants.error: nothing to repeat 
> 
> I can narrow it down to:
> 
> >>> import re
> >>> m = re.match(r"(x?)?", url)
> sre_constants.error: nothing to repeat 
> 
> whereas:
> 
> >>> m = re.match(r"(x?.)?", url)
> 
> works fine. Is this correct behaviour for SRE, or am I just being stupid?
> "(x?)?" looks like a perfectly reasonable Perl-style regexp to me (and Perl
> too)...

Well, (x?)? should be equivalent to (x)? or (x?), so
perhaps it's reasonable to be issued a warning. An
outright error seems rather harsh.

For your actual case, the closing

    ([xyz]?.*)?

(contents of charset changed for clarity) could be replaced
with

    ([xyz]?.*)

without any loss. (If ([xyz]?.*)? matches then either
([xyz]?.*) matches or an empty string does; but an
empty string also matches ([xyz]?.*). The only scope
for a difference is in whether the corresponding
match group gets '' or None; but it turns out that
in Python 1.5.2 it gets '' anyway, just as it does
with the "simplified" RE that I suggest.)

-- 
Gareth McCaughan  Gareth.McCaughan at pobox.com
sig under construction



More information about the Python-list mailing list