Regex for Python 2.7
Gabriel Genellina
gagsl-py2 at yahoo.com.ar
Sun Jan 25 05:24:43 EST 2009
En Sat, 24 Jan 2009 21:51:31 -0200, MRAB <google at mrabarnett.plus.com>
escribió:
> Gabriel Genellina wrote:
> > En Sat, 24 Jan 2009 18:23:51 -0200, MRAB <google at mrabarnett.plus.com>
> > escribió:
> >
> >> Some time ago I discovered this difference between regular
> expressions
> >> in Python and Perl:
> >>
> >> Python
> >>
> >> \A matches at start of string
> >> \Z matches at end of string
> >>
> >> Perl
> >>
> >> \A matches at start of string
> >> \Z matches before terminal newline or at end of string
> >> \z matches at end of string
> >>
> >> In Perl \A == ^ and \Z == $ in single-string mode, but in Python \A
> == ^
> >> and \Z != $ in single-string mode.
> >
> > Why do you want the two to be equivalent? Isn't a good thing that you
> > have both alternatives (\Z and $)? Use whichever is adequate in each
> case.
> >
> Python's \Z is equivalent to Perl's \z, but there's no equivalent to
> Perl's \Z in multi-line mode.
I tested both:
<code>
import re
texts = ["abc\ndef", "abc\n", "abc"]
exprs = [
re.compile(r"c$"),
re.compile(r"c\Z"),
re.compile(r"c$", re.MULTILINE),
re.compile(r"c\Z", re.MULTILINE),
]
for text in texts:
for expr in exprs:
m = re.search(expr, text)
print repr(text), expr.pattern, "match" if m else "no match"
</code>
c:\temp>python test_re.py
'abc\ndef' c$ no match
'abc\ndef' c\Z no match
'abc\ndef' c$ match
'abc\ndef' c\Z no match
'abc\n' c$ match
'abc\n' c\Z no match
'abc\n' c$ match
'abc\n' c\Z no match
'abc' c$ match
'abc' c\Z match
'abc' c$ match
'abc' c\Z match
<code>
@texts = ("abc\ndef", "abc\n", "abc");
@exprs = (qr/c$/,
qr/c\Z/,
qr/c$/m,
qr/c\Z/m,
# qr/c\z/,
# qr/c\z/m
);
foreach $text (@texts) {
($repr = $text) =~ s/\n/\\n/g;
foreach $expr (@exprs) {
print $repr, " ", $expr, " ";
if ($text =~ $expr) {
print "match\n";
} else {
print "no match\n";
}
}
}
</code>
c:\temp>perl test_re.pl
abc\ndef (?-xism:c$) no match
abc\ndef (?-xism:c\Z) no match
abc\ndef (?m-xis:c$) match
abc\ndef (?m-xis:c\Z) no match
abc\n (?-xism:c$) match
abc\n (?-xism:c\Z) match
abc\n (?m-xis:c$) match
abc\n (?m-xis:c\Z) match
abc (?-xism:c$) match
abc (?-xism:c\Z) match
abc (?m-xis:c$) match
abc (?m-xis:c\Z) match
If one wants to match end-of-line or end-of-string, use $ in multiline
mode. If one wants to match end-of-string only, use \Z. If one wants to
match end-of-line only, use \n [not shown].
--
Gabriel Genellina
More information about the Python-list
mailing list