[Baypiggies] Mini languages

Dennis Allison allison at shasta.stanford.edu
Sun May 7 21:29:11 CEST 2006



IMHO using regular expressions to parse anything complicated is a bad 
idea.  Splitting off the lexical processing (always ugly) from the 
semantics (can be clean) is always a win.


On Sun, 7 May 2006, Dennis Reinhardt wrote:

> At 11:38 AM 5/7/2006, Ken Seehart wrote:
> >This means I have to know when I reach the closing brace (which I can't do 
> >with regular expressions).  However, I'm sure I could do a prototype this 
> >way, using the assumption that the a closing brace on a class matches 
> >"^};", but that would be just plain sloppy :-)
> 
> I don't know your syntax but it sounds like you (1) know when to expect 
> braces.  I am further guessing that you have (2) a single level of 
> braces.  The routine below would work under these assumptions.  I have 
> implemented a self-compiler prior to working with Python which met those 
> assumptions.  If the assumptions could not be met, I would be inclined to 
> use LALR but I have not direct experience.  Rather, I designed the syntax 
> to not require LALR.  I have had some success in parsing under Python using 
> RE with the following code:
> 
> import sre, string
> #separate html into 5 components based on regex  case insensitive, dotall.
> #  Input regex should use (?:...) grouping, if any
> #  regex may not be compiled since we compile it here
> def regex_sep(str, regex1, regex2):
>      left = lm = mid = rm = right = ""  # return matched regex
>      flags = "(?is)"
>      re1 = sre.compile("(%s)%s" % (regex1, flags))
>      match1 = re1.search(str)
>      if match1:
>          lm = match1.group(1)
>          left, rest = split2(str, lm)
>          re2 = sre.compile("(%s)%s" % (regex2, flags))
>          match2 = re2.search(rest)
>          if match2:
>              rm = match2.group(1)
>              mid, right = split2(rest, rm)
>          else:
>              mid = rest
>      else:
>          left = str
>      return left, lm, mid, rm, right
> 
> def split2(str, pattern):
>      left = str
>      right = ""
>      try:
>          splitlen = len(string.split(str, pattern, 1))
>          if splitlen == 2:
>              left, right = string.split(str, pattern, 1)
>      except:
>          pass
>      return (left, right)
> 
> 
> The call
> 
>          x1,x2,x3,x4,x5 = regex_sep(input_str, "{", "}")
> 
> would separate input_str into 5 components
> 
>          x1 = text prior to first regex match
>             = input_str if no match and others ""
>          x2 = text which matched the first regex (trivially "{" here)
>          x3 = text between matched regex
>          x4 = text which matched second regex (trivially "}" here)
>          x5 = text following second regex match
> 
> Regards, Dennis
> 
> 
> ----------------------------------
> | Dennis    | DennisR at dair.com   |
> | Reinhardt | Powerful Anti-Spam |
> ----------------------------------
> 
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> http://mail.python.org/mailman/listinfo/baypiggies
> 

-- 



More information about the Baypiggies mailing list