[Baypiggies] Mini languages
Dennis Allison
allison at shasta.stanford.edu
Sun May 7 21:29:11 CEST 2006
IMHO using regular expressions to parse anything complicated is a bad
idea. Splitting off the lexical processing (always ugly) from the
semantics (can be clean) is always a win.
On Sun, 7 May 2006, Dennis Reinhardt wrote:
> At 11:38 AM 5/7/2006, Ken Seehart wrote:
> >This means I have to know when I reach the closing brace (which I can't do
> >with regular expressions). However, I'm sure I could do a prototype this
> >way, using the assumption that the a closing brace on a class matches
> >"^};", but that would be just plain sloppy :-)
>
> I don't know your syntax but it sounds like you (1) know when to expect
> braces. I am further guessing that you have (2) a single level of
> braces. The routine below would work under these assumptions. I have
> implemented a self-compiler prior to working with Python which met those
> assumptions. If the assumptions could not be met, I would be inclined to
> use LALR but I have not direct experience. Rather, I designed the syntax
> to not require LALR. I have had some success in parsing under Python using
> RE with the following code:
>
> import sre, string
> #separate html into 5 components based on regex case insensitive, dotall.
> # Input regex should use (?:...) grouping, if any
> # regex may not be compiled since we compile it here
> def regex_sep(str, regex1, regex2):
> left = lm = mid = rm = right = "" # return matched regex
> flags = "(?is)"
> re1 = sre.compile("(%s)%s" % (regex1, flags))
> match1 = re1.search(str)
> if match1:
> lm = match1.group(1)
> left, rest = split2(str, lm)
> re2 = sre.compile("(%s)%s" % (regex2, flags))
> match2 = re2.search(rest)
> if match2:
> rm = match2.group(1)
> mid, right = split2(rest, rm)
> else:
> mid = rest
> else:
> left = str
> return left, lm, mid, rm, right
>
> def split2(str, pattern):
> left = str
> right = ""
> try:
> splitlen = len(string.split(str, pattern, 1))
> if splitlen == 2:
> left, right = string.split(str, pattern, 1)
> except:
> pass
> return (left, right)
>
>
> The call
>
> x1,x2,x3,x4,x5 = regex_sep(input_str, "{", "}")
>
> would separate input_str into 5 components
>
> x1 = text prior to first regex match
> = input_str if no match and others ""
> x2 = text which matched the first regex (trivially "{" here)
> x3 = text between matched regex
> x4 = text which matched second regex (trivially "}" here)
> x5 = text following second regex match
>
> Regards, Dennis
>
>
> ----------------------------------
> | Dennis | DennisR at dair.com |
> | Reinhardt | Powerful Anti-Spam |
> ----------------------------------
>
> _______________________________________________
> Baypiggies mailing list
> Baypiggies at python.org
> http://mail.python.org/mailman/listinfo/baypiggies
>
--
More information about the Baypiggies
mailing list