Regexp syntax change in 1.6?
Gareth McCaughan
Gareth.McCaughan at pobox.com
Fri Sep 8 18:03:56 EDT 2000
Adam Sampson wrote:
> Under Python 1.5.2, I had a script containing the following line:
>
> m = re.match(r"[a-z0-9]*://[^/]+/.*\.([^.#\?/]*)([#\?]?.*)?", url)
>
> (Bonus points for guessing what it does; answer down the bottom.)
> Under 1.6, this fails with:
..
> sre_constants.error: nothing to repeat
>
> I can narrow it down to:
>
> >>> import re
> >>> m = re.match(r"(x?)?", url)
> sre_constants.error: nothing to repeat
>
> whereas:
>
> >>> m = re.match(r"(x?.)?", url)
>
> works fine. Is this correct behaviour for SRE, or am I just being stupid?
> "(x?)?" looks like a perfectly reasonable Perl-style regexp to me (and Perl
> too)...
Well, (x?)? should be equivalent to (x)? or (x?), so
perhaps it's reasonable to be issued a warning. An
outright error seems rather harsh.
For your actual case, the closing
([xyz]?.*)?
(contents of charset changed for clarity) could be replaced
with
([xyz]?.*)
without any loss. (If ([xyz]?.*)? matches then either
([xyz]?.*) matches or an empty string does; but an
empty string also matches ([xyz]?.*). The only scope
for a difference is in whether the corresponding
match group gets '' or None; but it turns out that
in Python 1.5.2 it gets '' anyway, just as it does
with the "simplified" RE that I suggest.)
--
Gareth McCaughan Gareth.McCaughan at pobox.com
sig under construction
More information about the Python-list
mailing list