Two RE proposals

David LeBlanc whisper at oz.net
Sat Jul 27 13:28:14 EDT 2002


>
> David LeBlanc wrote:
> > 1. Add a substitution operator - in the example below it's "!<..>"
> >
> > word = r"\w*"
> > punct = r"[,.;?]"
> > wordpunct = re.compile(r"!<word>!<punct>")
> >
> > The re compiler sees r"\w*[,.;?]"
> > Trivial example, but for fancier patterns it would be great IMO.
> > A substitution pass should be done over the substituted text
> for nesting:
>
> python already has string substitution.  if it needs better string
> substitution, that should be solved outside the RE engine.

why? I'm not suggesting a general solution. I thought about suggesting it,
but I figured there where probably more ! characters in general strings then
in re strings. And, there is ample precedent for characters that have no
special meaning outside of re strings. Oh yeah - and i'm not suggesting a
modification to Python, i'm suggesting a modification to re-language.

> besides, having library modules peek in your local namespace is
> really bad style.

Damn - there goes inspect! I wonder what else displays bad style?
Introspection/reflection considered harmful?

> and your proposal will break existing code.

Unsubstantiated. How can you make that assertion? Aside from which, has no
Python enhancement ever broken existing code?

> :::
>
> the following approach works in all existing versions of Python,
> gives you syntax highlighting in all existing Python editors, etc:
>
>     def i(*args):
>         return string.join(map(str, args))
>
>     word = r"\w*"
>     punct = r"[,.;?]"
>     wordpunct = re.compile(i(word, punct))
>
>     if = r"if"
>     term = r"something"
>     num = r"\d*"
>     op = r"[-+*/]"
>     factor = i(num, "\s*", op, "\s*", num)
>     expr = i(term, factor)
>     if_stmt = re.compile(i(if, "\s*\(?\s*", expr, "\s*\)?\s*:"))

Wow, you make Skip's example seem positively eloquent in contrast.

> if you're doing lots of RE stuff, you can trivially extend this to
> support RE-oriented operations:
>
>     if = literal("if")
>     op = set("-+*/")
>     factor = seq(num, ws, op, ws, num)
>
> (google for "rxb" for a complete implementation of that idea)

looked - not impressed. Might be of interest to XSchema or Relax-NG people.
Not very re-like.

> > 2. Make r"(a|b)*" mean any number of a's or b's.
>
> it does mean any number of a's or b's.  but no more than a
> single a or b will end up in the group.

Huh? Actually, re rejects the pattern, or if you try hard enough, goes into
an infinite loop.

> > This doesn't work, at least in some situations with the current
> > re compiler - the "any" op "*" doesn't seem to span over a parened
> > group
>
>     for i in range(20):
>         s = file.read(1)
>
> doesn't give you a 20 character string either (nor a 20 item list)
>
> fixing the read statement is of course trivial.

How hard would it be to make this example make sense? ;-)

> fixing the RE is done in a similar fashion: make sure the group
> matches everything you want to put in the group:
>
>     r"((?:(a|b)*)"
>
> if you want lists of matching things, use findall.

So hard to embed findall into an re pattern - what's your secret?

> </F>
>

What do I do if I want a better Python? Do we wait for specific people to
make suggestions or can anyone join in?

David LeBlanc
Seattle, WA USA





More information about the Python-list mailing list