Regex help needed

Paul McGuire ptmcg at austin.rr._bogus_.com
Tue Jan 10 11:40:10 EST 2006


"rh0dium" <sklass at pointcircle.com> wrote in message
news:1136908778.686816.169140 at g49g2000cwa.googlegroups.com...
> Hi all,
>
> I am using python to drive another tool using pexpect.  The values
> which I get back I would like to automatically put into a list if there
> is more than one return value. They provide me a way to see that the
> data is in set by parenthesising it.
>
<snip>

Well, you asked for regex help, but a pyparsing rendition may be easier to
read and maintain.

-- Paul
(Download pyparsing at http://pyparsing.sourceforge.net.)


# test data strings
test1 = """somefunction()
"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005 23:36 (cicln01) $"
"""

test2 = """somefunction()
("." "~"
"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"
"foo")
"""

test3 = """somefunctionWithNestedlist()
("." "~"
"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"
    ("Hey!"
    "this is a nested"
    "list")
"foo")
"""

"""
So if you're still reading this I want to parse out data.  Here are the
rules...
- Line 1 ALWAYS is the calling function whatever is there (except
"\r\n") should be kept as "original"
- Anything may occur inside the quotations - I don't care what's in
there per se but it must be maintained.
- Parenthesed items I want to be pushed into a list.  I haven't run
into a case where you have nested paren's but that not to say it won't
happen...
"""

from pyparsing import Literal, Word, alphas, alphanums, \
        dblQuotedString, OneOrMore, Group, Forward

LPAR = Literal("(")
RPAR = Literal(")")

# assume function identifiers must start with alphas, followed by zero or
more
# alphas, numbers, or '_' - expand this defn as needed
ident = Word(alphas,alphanums+"_")

# define a list as one or more quoted strings, inside ()'s - we'll tackle
nesting
# in a minute
quoteList = Group( LPAR.suppress() +
                   OneOrMore(dblQuotedString) +
                   RPAR.suppress() )

# define format of a line of data - don't bother with \n's or \r's,
# pyparsing just skips 'em
dataFormat = ident + LPAR + RPAR + ( dblQuotedString | quoteList )

def test(t):
    print dataFormat.parseString(t)

print "Parse flat lists"
test(test1)
test(test2)

# modifications for nested lists
quoteList = Forward()
quoteList << Group( LPAR.suppress() +
                   OneOrMore(dblQuotedString | quoteList) +
                   RPAR.suppress() )
dataFormat = ident + LPAR + RPAR + ( dblQuotedString | quoteList )

print
print "Parse using nested lists"
test(test1)
test(test2)
test(test3)

Parsing results:
Parse flat lists
['somefunction', '(', ')', '"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $"']
['somefunction', '(', ')', ['"."', '"~"',
'"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"', '"foo"']]

Parse using nested lists
['somefunction', '(', ')', '"@(#)$CDS: icfb.exe version 5.1.0 05/22/2005
23:36 (cicln01) $"']
['somefunction', '(', ')', ['"."', '"~"',
'"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"', '"foo"']]
['somefunctionWithNestedlist', '(', ')', ['"."', '"~"',
'"/eda/ic_5.10.41.500.1.18/tools.lnx86/dfII/samples/techfile"', ['"Hey!"',
'"this is a nested"', '"list"'], '"foo"']]






More information about the Python-list mailing list