Pyparsing: Grammar Suggestion

Paul McGuire ptmcg at
Wed May 17 18:08:27 CEST 2006

"Khoa Nguyen" < at> wrote in message
news:mailman.5814.1147879481.27775.python-list at
I am trying to come up with a grammar that describes the following:

record = f1,f2,...,fn END_RECORD
All the f(i) has to be in that order.
Any f(i) can be absent (e.g. f1,,f3,f4,,f6 END_RECORD)
Number of f(i)'s can vary. For example, the followings are allowed:
f1,f2,,f4,,f6 END_RECORD

Any suggestions?


pyparsing includes a built-in expression, commaSeparatedList, for just such
a case.  Here is a simple pyparsing program to crack your input text:

data = """f1,f2,f3,f4,f5,f6 END_RECORD
f1,f2,,f4,,f6 END_RECORD"""

from pyparsing import commaSeparatedList

for tokens,start,end in commaSeparatedList.scanString(data):
    print tokens

This returns:
['f1', 'f2', 'f3', 'f4', 'f5', 'f6 END_RECORD']
['f1', 'f2 END_RECORD']
['f1', 'f2', '', 'f4', '', 'f6 END_RECORD']

Note that consecutive commas in the input return empty strings at the
corresponding places in the results.

Unfortunately, commaSeparatedList embeds its own definition of what is
allowed between commas, so the last field looks like it always has
END_RECORD added to the end.  We could copy the definition of
commaSeparatedList and exclude this, but it is simpler just to add a parse
action to commaSeparatedList, to remove END_RECORD from the -1'th list

def stripEND_RECORD(s,l,t):
    last = t[-1]
    if last.endswith("END_RECORD"):
        # return a copy of t with last element trimmed of "END_RECORD"
        return t[:-1] + [last[:-(len("END_RECORD"))].rstrip()]


for tokens,start,end in commaSeparatedList.scanString(data):
    print tokens

This returns:

['f1', 'f2', 'f3', 'f4', 'f5', 'f6']
['f1', 'f2']
['f1', 'f2', '', 'f4', '', 'f6']

As one of my wife's 3rd graders concluded on a science report - "wah-lah!"

Python also includes a csv module if this example doesn't work for you, but
you asked for a pyparsing solution, so there it is.

-- Paul

More information about the Python-list mailing list