Breaking up Strings correctly:

Paul McGuire ptmcg at austin.rr.com
Mon Apr 9 11:39:44 EDT 2007


On Apr 9, 7:19 am, "Michael Yanowitz" <m.yanow... at kearfott.com> wrote:
> Hello:
>
>    I have been searching for an easy solution, and hopefully one
> has already been written, so I don't want to reinvent the wheel:
>
>    Suppose I have a string of expressions such as:
> "((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4") AND ($AY !=
> 0)))
>   I would like to split up into something like:
> [ "OR",
>   "(($IP = "127.1.2.3") AND ($AX < 15))",
>   "(($IP = "127.1.2.4") AND ($AY != 0))" ]
>
>      which I may then decide to or not to further split into:
> [ "OR",
>   ["AND", "($IP = "127.1.2.3")", "($AX < 15)"],
>   ["AND", "(($IP = "127.1.2.4")", ($AY != 0))"] ]
>
>   Is there an easy way to do this?
> I tried using regular expressions, re, but I don't think it is
> recursive enough. I really want to break it up from:
> (E1 AND_or_OR E2) and make that int [AND_or_OR, E1, E2]
>   and apply the same to E1 and E2 recursively until E1[0] != '('
>
>    But the main problem I am running to is, how do I split this up
> by outer parentheseis. So that I get the proper '(' and ')' to split
> this upper correctly?
>
> Thanks in advance:
> Michael Yanowitz

This problem is right down the pyparsing fairway!  Pyparsing is a
module for defining recursive-descent parsers, and it has some built-
in help just for applications such as this.

You start by defining the basic elements of the text to be parsed.  In
your sample text, you are combining a number of relational
comparisons, made up of variable names and literal integers and quoted
strings.  Using pyparsing classes, we define these:

varName = Word("$",alphas, min=2)
integer = Word("0123456789").setParseAction( lambda t : int(t[0]) )
varVal = dblQuotedString | integer

varName is a "word" starting with a $, followed by 1 or more alphas.
integer is a "word" made up of 1 or more digits, and we add a parsing
action to convert these to Python ints.  varVal shows that a value can
be an integer or a dblQuotedString (a common expression included with
pyparsing).

Next we define the set of relational operators, and the comparison
expression:

relationalOp = oneOf("= < > >= <= !=")
comparison = Group(varName + relationalOp + varVal)

The comparison expression is grouped so as to keep tokens separate
from surrounding expressions.

Now the most complicated part, to use the operatorPrecedence method
from pyparsing.  It is possible to create the recursive grammar
explicitly, but this is another application that is very common, so
pyparsing includes a helper for it too.  Here is your set of
operations defined using operatorPrecedence:

boolExpr = operatorPrecedence( comparison,
    [
    ( "AND", 2, opAssoc.LEFT ),
    ( "OR", 2, opAssoc.LEFT ),
    ])

operatorPrecedence takes 2 arguments: the base-level or atom
expression (in your case, the comparison expression), and a list of
tuples listing the operators in descending priority.  Each tuple gives
the operator, the number of operands (1 or 2), and whether it is right
or left associative.

Now the only thing left to do is use boolExpr to parse your test
string:

results = boolExpr.parseString('((($IP = "127.1.2.3") AND ($AX < 15))
OR (($IP = "127.1.2.4") AND ($AY != 0)))')

pyparsing returns parsed tokens as a rich object of type
ParseResults.  This object can be accessed as a list, dict, or object
instance with named attributes.  For this example, we'll actually
create a nested list using ParseResults' asList method.  Passing this
list to the pprint module we get:

pprint.pprint( results.asList() )

prints

[[[['$IP', '=', '"127.1.2.3"'], 'AND', ['$AX', '<', 15]],
  'OR',
  [['$IP', '=', '"127.1.2.4"'], 'AND', ['$AY', '!=', 0]]]]


Here is the whole program in one chunk (I also added support for NOT -
higher priority than AND, and right-associative):

test = '((($IP = "127.1.2.3") AND ($AX < 15)) OR (($IP = "127.1.2.4")
AND ($AY != 0)))'

from pyparsing import oneOf, Word, alphas, dblQuotedString, nums, \
    Literal, Group, operatorPrecedence, opAssoc

varName = Word("$",alphas)
integer = Word(nums).setParseAction( lambda t : int(t[0]) )
varVal = dblQuotedString | integer

relationalOp = oneOf("= < > >= <= !=")
comparison = Group(varName + relationalOp + varVal)

boolExpr = operatorPrecedence( comparison,
    [
    ( "NOT", 1, opAssoc.RIGHT ),
    ( "AND", 2, opAssoc.LEFT ),
    ( "OR", 2, opAssoc.LEFT ),
    ])

import pprint
pprint.pprint( boolExpr.parseString(test).asList() )


The pyparsing wiki includes some related examples, SimpleBool.py and
SimpleArith.py - go to http://pyparsing.wikispaces.com/Examples.

-- Paul




More information about the Python-list mailing list