[Tutor] regex help

Paul McGuire ptmcg at austin.rr.com
Mon Feb 23 12:07:18 CET 2009


I second Alan G's appreciation for a well-thought-through and well-conveyed
description of your text processing task.  (Is "Alan G" his gangsta name, I
wonder?)

This pyparsing snippet may point you to some easier-to-follow code,
especially once you go beyond the immediate task and do more exhaustive
parsing of your syllable syntax.


from pyparsing import *

LT,GT = map(Suppress,"<>")
lower = oneOf(list(alphas.lower()))
H = Suppress("H")

# have to look ahead to only accept lowers if NOT followed by H
patt = LT + H + ZeroOrMore(lower + ~H)("body")  + lower + H + GT

tests = """\
a b c<H d e f gH> h<H i j kH>
a b c<H dH>
a b c<H d eH>""".splitlines()

for t in tests:
    print t
    print sum((list(p.body)
                for p in patt.searchString(t) if p.body), [])
    print

Prints:

a b c<H d e f gH> h<H i j kH>
['d', 'e', 'f', 'i', 'j']

a b c<H dH>
[]

a b c<H d eH>
['d']

There is more info on pyparsing at http://pyparsing.wikispaces.com.

-- Paul





More information about the Tutor mailing list