[Baypiggies] Mini languages

Sun May 7 21:15:36 CEST 2006

At 11:38 AM 5/7/2006, Ken Seehart wrote:
>This means I have to know when I reach the closing brace (which I can't do 
>with regular expressions).  However, I'm sure I could do a prototype this 
>way, using the assumption that the a closing brace on a class matches 
>"^};", but that would be just plain sloppy :-)

I don't know your syntax but it sounds like you (1) know when to expect 
braces.  I am further guessing that you have (2) a single level of 
braces.  The routine below would work under these assumptions.  I have 
implemented a self-compiler prior to working with Python which met those 
assumptions.  If the assumptions could not be met, I would be inclined to 
use LALR but I have not direct experience.  Rather, I designed the syntax 
to not require LALR.  I have had some success in parsing under Python using 
RE with the following code:

import sre, string
#separate html into 5 components based on regex  case insensitive, dotall.
#  Input regex should use (?:...) grouping, if any
#  regex may not be compiled since we compile it here
def regex_sep(str, regex1, regex2):
     left = lm = mid = rm = right = ""  # return matched regex
     flags = "(?is)"
     re1 = sre.compile("(%s)%s" % (regex1, flags))
     match1 = re1.search(str)
     if match1:
         lm = match1.group(1)
         left, rest = split2(str, lm)
         re2 = sre.compile("(%s)%s" % (regex2, flags))
         match2 = re2.search(rest)
         if match2:
             rm = match2.group(1)
             mid, right = split2(rest, rm)
         else:
             mid = rest
     else:
         left = str
     return left, lm, mid, rm, right

def split2(str, pattern):
     left = str
     right = ""
     try:
         splitlen = len(string.split(str, pattern, 1))
         if splitlen == 2:
             left, right = string.split(str, pattern, 1)
     except:
         pass
     return (left, right)

The call

         x1,x2,x3,x4,x5 = regex_sep(input_str, "{", "}")

would separate input_str into 5 components

         x1 = text prior to first regex match
            = input_str if no match and others ""
         x2 = text which matched the first regex (trivially "{" here)
         x3 = text between matched regex
         x4 = text which matched second regex (trivially "}" here)
         x5 = text following second regex match

Regards, Dennis

----------------------------------
| Dennis    | DennisR at dair.com   |
| Reinhardt | Powerful Anti-Spam |
----------------------------------