Nothing to repeat
Martin Gregorie
martin at address-in-sig.invalid
Sun Jan 9 13:05:46 EST 2011
On Sun, 09 Jan 2011 16:49:35 +0000, Tom Anderson wrote:
>
> Any thoughts on what i should do? Do i have to bite the bullet and apply
> some cleverness in my pattern generation to avoid situations like this?
>
This sort of works:
import re
f = open("test.txt")
p = re.compile("(spam*)*")
for line in f:
print "input line: %s" % (line.strip())
for m in p.findall(line):
if m != "":
print "==> %s" % (m)
when I feed it
=======================test.txt===========================
a line with no match
spa should match
spam should match
so should all of spaspamspammspammm
and so should all of spa spam spamm spammm
no match again.
=======================test.txt===========================
it produces:
input line: a line with no match
input line: spa should match
==> spa
input line: spam should match
==> spam
input line: so should all of spaspamspammspammm
==> spammm
input line: and so should all of spa spam spamm spammm
==> spa
==> spam
==> spamm
==> spammm
input line: no match again.
so obviously there's a problem with greedy matching where there are no
separators between adjacent matching strings. I tried non-greedy
matching, e.g. r'(spam*?)*', but this was worse, so I'll be interested to
see how the real regex mavens do it.
--
martin@ | Martin Gregorie
gregorie. | Essex, UK
org |
More information about the Python-list
mailing list