Regex help needed

Paul McGuire ptmcg at austin.rr._bogus_.com
Tue Jan 10 14:14:00 EST 2006


"rh0dium" <sklass at pointcircle.com> wrote in message
news:1136916515.520659.287850 at z14g2000cwz.googlegroups.com...
>
> Paul McGuire wrote:
> > -- Paul
> > (Download pyparsing at http://pyparsing.sourceforge.net.)
>
> Done.
>
>
> Hey this is pretty cool!  I have one small problem that I don't know
> how to resolve.  I want the entire contents (whatever it is) of line 1
> to be the ident.  Now digging into the code showed a method line,
> lineno and LineStart LineEnd.  I tried to use all three but it didn't
> work for a few reasons ( line = type issues, lineno - I needed the data
> and could't get it to work, LineStart/End - I think it matches every
> line and I need the scope to line 1 )
>
> So here is my rendition of the code - But this is REALLY slick..
>
> I think the problem is the parens on line one....
>
> def main(data=None):
>
>     LPAR = Literal("(")
>     RPAR = Literal(")")
>
>     # assume function identifiers must start with alphas, followed by
> zero or more
>     # alphas, numbers, or '_' - expand this defn as needed
>     ident = LineStart + LineEnd
>
>     # define a list as one or more quoted strings, inside ()'s - we'll
> tackle nesting
>     # in a minute
>     quoteList = Group( LPAR.suppress() + OneOrMore(dblQuotedString) +
> RPAR.suppress())
>
>     # define format of a line of data - don't bother with \n's or \r's,
>
>     # pyparsing just skips 'em
>     dataFormat = ident + ( dblQuotedString | quoteList )
>
>     return dataFormat.parseString(data)
>
>
> # General run..
> if __name__ == '__main__':
>
>
> #     data = 'someFunction\r\n "test" "foo"\r\n'
> #     data = 'someFunction\r\n "test  foo"\r\n'
>     data = 'getVersion()\r\n"@(#)$CDS: icfb.exe version 5.1.0
> 05/22/2005 23:36 (cicln01) $"\r\n'
> #     data = 'someFunction\r\n ("test" "test1" "foo aasdfasdf"\r\n
> "newline" "test2")\r\n'
>
>     foo = main(data)
>
>     print foo
>

LineStart() + LineEnd() will only match an empty line.


If you describe in words what you want ident to be, it may be more natural
to translate to pyparsing.

"A word starting with an alpha, followed by zero or more alphas, numbers, or
'_'s, with a trailing pair of parens"

ident = Word(alpha,alphanums+"_") + LPAR + RPAR


If you want the ident all combined into a single token, use:

ident = Combine( Word(alpha,alphanums+"_") + LPAR + RPAR )


LineStart and LineEnd are geared more for line-oriented or
whitespace-sensitive grammars.  Your example doesn't really need them, I
don't think.

If you *really* want everything on the first line to be the ident, try this:

ident = Word(alpha,alphanums+"_") + restOfLine
or
ident = Combine( Word(alpha,alphanums+"_") + restOfLine )


Now the next step is to assign field names to the results:

dataFormat = ident.setResultsName("ident") + ( dblQuotedString |
quoteList ).setResultsName("contents")

test = "blah blah test string"

results = dataFormat.parseString(test)
print results.ident, results.contents

I'm glad pyparsing is working out for you!  There should be a number of
examples that ship with pyparsing that may give you some more ideas on how
to proceed from here.

-- Paul





More information about the Python-list mailing list