Tag parsing in python

Paul McGuire ptmcg at austin.rr.com
Sun Aug 29 14:43:49 CEST 2010

On Aug 28, 11:23 pm, Paul McGuire <pt... at austin.rr.com> wrote:
> On Aug 28, 11:14 am, agnibhu <dee... at gmail.com> wrote:
> > Hi all,
> > I'm a newbie in python. I'm trying to create a library for parsing
> > certain keywords.
> > For example say I've key words like abc: bcd: cde: like that... So the
> > user may use like
> > abc: How are you bcd: I'm fine cde: ok
> > So I've to extract the "How are you" and "I'm fine" and "ok"..and
> > assign them to abc:, bcd: and cde: respectively.. There may be
> > combination of keyowords introduced in future. like abc: xy: How are
> > you
> > So new keywords qualifying the other keywords so on..

I got to thinking more about your keywords-qualifying-keywords
example, and I thought this would be a good way to support locale-
specific tags.  I also thought how one might want to have tags within
tags, to be substituted later, requiring a "abc::" escaped form of
"abc:", so that the tag is substituted with the value of tag "abc:" as
a late binding.

Wasn't too hard to modify what I posted yesterday, and now I rather
like it.

-- Paul

# tag_substitute.py

from pyparsing import (Combine, Word, alphas, FollowedBy, Group,
    empty, SkipTo, LineEnd, Optional, Forward, MatchFirst, Literal,
And, replaceWith)

tag = Combine(Word(alphas) + ~FollowedBy("::") + ":")
tag_defn = Group(OneOrMore(tag))("tag") + empty + SkipTo(tag |
LineEnd())("body") + Optional(LineEnd().suppress())

# now combine macro detection with substitution
macros = {}
macro_substitution = Forward()
def make_macro_sub(tokens):
    # unescape '::' and substitute any embedded tags
    tag_value =

    # save this tag and value (or overwrite previous)
    macros[tuple(tokens.tag)] = tag_value

    # define overall macro substitution expression
    macro_substitution << MatchFirst(
            [(Literal(k[0]) if len(k)==1
                else And([Literal(kk) for kk in
                    for k,v in macros.items()] ) + ~FollowedBy(tag)

    # return empty string, so macro definitions don't show up in final
    # expanded text
    return ""


# define pattern for macro scanning
scan_pattern = macro_substitution | tag_defn

sorry = """\
nm: Dave
sorry: en: I'm sorry, nm::, I'm afraid I can't do that.
sorry: es: Lo siento nm::, me temo que no puedo hacer eso.
Hal said, "sorry: en:"
Hal dijo, "sorry: es:" """
print scan_pattern.transformString(sorry)


Hal said, "I'm sorry, Dave, I'm afraid I can't do that."
Hal dijo, "Lo siento Dave, me temo que no puedo hacer eso."

More information about the Python-list mailing list