Making regex suck less

Gerhard Häring gerhard.haering at gmx.de
Sun Sep 1 15:13:56 EDT 2002


* Gerson Kurz <gerson.kurz at t-online.de> [2002-09-01 18:31 +0000]:
> [...] Anyway, that got me thinking on why do we have to deal with
> regular expressions like r"((?:a|b)*)", when in most cases the code
> will look something like this:
> 
> r = re.compile("<some cryptic re-string here>")
> ...
> r.match(this) or r.find(that)

If you only use the RE once, you can use the module-level functions ;-)

> which means the real time is not spent in the compile() function, but
> in the match or find function. So basically, couldn't one come up with
> a *human readable* syntax for re, and compile that instead?

That's equally powerful? Most probably not.

> Also, I think it would already be an improvement if the syntax
> provided for clear and easy-to-understand special cases, like
> 
> re.compile("anything that starts with 'abc'")

s.startswith("abc")
s.lower().startswith("abc")

> and if you cannot find something in the special cases for you, you can
> always go back to 
> 
> re.compile("<some cryptinc re-string here>")
> 
> After all, *everyone* starting with re thinks the syntax is cryptic
> and mind-boggling, and only if you get yourself into the "re mindset",
> you understand things like r"\s*\w+\s*=\s*['\"].*?['\"]" instantly. If
> we had an easier syntax, more people would be using re ;) 
> 
> Is the idea utterly foolish? 

I don't really know. IMO if you have very simple string-searching, then
you can probably get away with the string methods, and if you have very
complex stuff, then you'll probably be better of with a parser generator
(like SimpleParse, which is very readable, IMO).

I don't find regular expressions that unreadably, especially when I
consider that I'd have to write many lines of error-prone Python code
instead. Stuff like this is just too convenient:

# working around zxDateTime limitations:
if JYTHON:
    import re

    ISO_DATE_RE = re.compile(r"(\d\d\d\d)-(\d\d)-(\d\d)")
    def DateFrom(s):
        match = ISO_DATE_RE.match(s)
        if match is None:
            raise ValueError
        return DateTime(*map(int, match.groups()))

Gerhard
-- 
mail:   gerhard <at> bigfoot <dot> de       registered Linux user #64239
web:    http://www.cs.fhm.edu/~ifw00065/    OpenPGP public key id AD24C930
public key fingerprint: 3FCC 8700 3012 0A9E B0C9  3667 814B 9CAA AD24 C930
reduce(lambda x,y:x+y,map(lambda x:chr(ord(x)^42),tuple('zS^BED\nX_FOY\x0b')))




More information about the Python-list mailing list