split a string with quoted parts into list

Paul McGuire ptmcg at austin.rr.com
Fri Mar 11 00:31:34 EST 2005


Oliver -

Here is a simpler approach, hopefully more readable, using pyparsing
(at http://pyparsing.sourceforge.net).  I also added another test word
to your sample input line, one consisting of a lone pair of double
quotes, signifying an empty string.  (Be sure to remove leading '.'s
from Python text - necessary to retain program indentation which Google
Groups otherwise collapses.)

-- Paul


.data = r"""
.(\HasNoChildren) "." "INBOX.Sent Items" ""
."""
.
.from pyparsing import printables,Word,dblQuotedString,OneOrMore
.
.nonQuoteChars = "".join( [ c for c in printables if c not in '"'] )
.word = Word(nonQuoteChars) | dblQuotedString
.
.words = OneOrMore(word)
.
.for s in words.parseString(data):
.    print ">%s<" % s
.
Gives:

>(\HasNoChildren)<
>"."<
>"INBOX.Sent Items"<
>""<

But really, I'm guessing that you'd rather not have the quote
characters in there either.  It's simple enough to have pyparsing
remove them when a dblQuotedString is found:

.# add a parse action to remove the double quote characters
.# one of the beauties of parse actions is that there is no need to
.# verify that the first and last characters are "'s - this function
.# is never called unless the tokens in tokenslist match the
.# required expression
.def removeDblQuotes(st,loc,tokenslist):
.    return tokenslist[0][1:-1]
.dblQuotedString.setParseAction( removeDblQuotes )
.
.for s in words.parseString(data):
.    print ">%s<" % s
.
Gives:
>(\HasNoChildren)<
>.<
>INBOX.Sent Items<
><




More information about the Python-list mailing list