Making regex suck less

jepler at unpythonic.net jepler at unpythonic.net
Tue Sep 3 02:37:40 CEST 2002


On Mon, Sep 02, 2002 at 09:23:18PM +1000, John La Rooy wrote:
> Carl Banks wrote:
> 
> >>It would be more likely to look like this (I haven't put too much 
> >>thought into this)
> >
> >
> >No kidding.
> >
> >
> >
> >>"anything,anything,anything,same_as_3rd,same_as_2nd,same_as_1st"
> >>or would you like to suggest something else?
> >
> >
> >How about:
> >
> >pattern = Group(Any()) + Group(Any()) + Group(Any()) \
> >          + GroupRef(3) + GroupRef(2) + GroupRef(1)
> >
> Err symantically that's exactly the same as the re and my suggestion
> only the syntax is different. It's still nothing like saying
> 
> pattern = "6 character palindrome"

Do you mean something like this?

    def palindrome_re(n):
	pat = ["(.)" * ((n+1)/2)]
	for i in range(n/2, 0, -1):
	    pat.append("\\%d" % i)
	return "".join(pat)

With a little work, you can extend this to use named groups and named
backrefs as well, so that you can use it as a building block for larger
patterns:

    def Any(): return "."
    def Group(s, g): return "(?P<%s>%s)" % (g, s)
    def Backref(g): return "(?P=%s)" % g
    def Or(*args): return "|".join(args)

    def palindrome_re(n, p):
	pat = [Group(Any(), "%s%d") % (p, i+1) for i in range((n+1)/2)]
	for i in range(n/2, 0, -1):
	    pat.append(Backref("%s%d" % (p, i)))
	return "".join(pat)

I think that building REs in functions is a great approach for more
complex REs.

>>> q = re.compile(palindrome_re(7, "a") + palindrome_re(6, "b"))
>>> q.match("abcdcbaxyzzyx")
<_sre.SRE_Match object at 0x401c4f00>
>>> _.groupdict()
{'l4': 'd', 'l2': 'b', 'l3': 'c', 'l1': 'a', 'i1': 'x', 'i3': 'z', 'i2': 'y'}
>>> q = re.compile(Or(palindrome_re(7, "a"), palindrome_re(6, "b")))
>>> q.match('abccbb')
>>> q.match("abcdcba")
<_sre.SRE_Match object at 0x401c4f00>
>>> q.match("abccba")
<_sre.SRE_Match object at 0x402e5020>

Jeff




More information about the Python-list mailing list