Parsing: request for pointers

Paul McGuire ptmcg at austin.rr.com
Thu Nov 13 10:47:34 EST 2008


On Nov 11, 1:59 pm, André <andre.robe... at gmail.com> wrote:
> Hi everyone,
>
> I would like to implement a parser for a mini-language
> and would appreciate some pointers.  The type of
> text I would like to parse is an extension of:
>
> http://www.websequencediagrams.com/examples.html
>
> For those that don't want to go to the link, consider
> the following, *very* simplified, example:
> =======
>
> programmer Guido
> programmer "Fredrik Lundh" as effbot
> programmer "Alex Martelli" as martellibot
> programmer "Tim Peters" as timbot
> note left of effbot: cutting sense of humor
> note over martellibot:
>     Offers detailed note, explaining a problem,
>     accompanied by culinary diversion
>     to the delight of the reader
> note over timbot: programmer "clever" as fox
> timbot -> Guido: I give you doctest
> Guido --> timbot: Have you checked my time machine?
>
> =======
> From this, I would like to be able to extract
> ("programmer", "Guido")
> ("programmer as", "Fredrik Lundh", "effbot")
> ...
> ("note left of", "effbot", "cutting sense of humor")
> ("note over", "martellibot", "Offers...")
> ("note over", "timbot", 'programmer "clever" as fox')
>

Even if you choose not to use pyparsing, a pyparsing example might
give you some insights into your problem.  See how the grammar is
built up from separate pieces.  Parse actions in pyparsing implement
callbacks to do parse-time conversion - in this case, the multiline
note body is converted from the parsed list of separate strings into a
single newline-separated string.

Here is the pyparsing example:

from pyparsing import Suppress, Combine, LineEnd, Word, alphas,
alphanums,\
    quotedString, Keyword, Optional, oneOf, restOfLine, indentedBlock,
\
    removeQuotes,empty,OneOrMore,Group

# used to manage indentation levels when parsing indented blocks
indentstack = [1]

# define some basic punctuation and terminal words
COLON = Suppress(":")
ARROW = Combine(Word('-')+'>')
NL = LineEnd().suppress()
ident = Word(alphas,alphanums+"-_")
quotedString.setParseAction(removeQuotes)

# programmer definition
progDefn = Keyword("programmer") + Optional(quotedString("alias") + \
                Optional("as")) + ident("name")

# new pyparsing idiom - embed simple asserts to verify bits of the
# overall grammar in isolation
assert "programmer Guido" == progDefn
assert 'programmer "Tim Peters" as timbot' == progDefn

# note specification - only complicated part is the indented block
# form of the note we use a pyparsing parse action to convert the
# nested token lists into a multiline string
OF = Optional("of")
notelocn = oneOf("over under") | "left" + OF | "right" + OF
notetext = restOfLine.setName("notetext")
noteblock = indentedBlock(notetext, indentstack).setName("noteblock")
noteblock.setParseAction(lambda t:'\n'.join(tt[0] for tt in t[0]))
note = Keyword("note") + notelocn("location") + ident("subject") +
COLON + \
    (~NL + empty + notetext("note") | noteblock("note") )
assert 'note over timbot: programmer "clever" as fox ' == note

# message definition
msg = ident("from") + ARROW + ident("to") + COLON + empty + notetext
("note")
assert 'Guido --> timbot: Have you checked my time machine?' == msg

# a seqstatement is one of these 3 types of statements
seqStatement = progDefn | note | msg

# parse the sample text
parsedStatements = OneOrMore(Group(seqStatement)).parseString(seqtext)

# print out token/field dumps for each statement
for s in parsedStatements:
    print s.dump()

Prints:

['programmer', 'Guido']
- name: Guido
['programmer', 'Fredrik Lundh', 'as', 'effbot']
- alias: Fredrik Lundh
- name: effbot
['programmer', 'Alex Martelli', 'as', 'martellibot']
- alias: Alex Martelli
- name: martellibot
['programmer', 'Tim Peters', 'as', 'timbot']
- alias: Tim Peters
- name: timbot
['note', 'left', 'of', 'effbot', 'cutting sense of humor ']
- location: left
- note: cutting sense of humor
- subject: effbot
['note', 'over', 'martellibot', 'Offers ...']
- location: over
- note: Offers detailed note, explaining a problem,
accompanied by culinary diversion
to the delight of the reader
- subject: martellibot
['note', 'over', 'timbot', 'programmer "clever" as fox ']
- location: over
- note: programmer "clever" as fox
- subject: timbot
['timbot', '->', 'Guido', 'I give you doctest ']
- from: timbot
- note: I give you doctest
- to: Guido
['Guido', '-->', 'timbot', 'Have you checked my time machine?']
- from: Guido
- note: Have you checked my time machine?
- to: timbot

Best of luck in your project,
-- Paul



More information about the Python-list mailing list