Making regex suck less

Carl Banks imbosol at vt.edu
Mon Sep 2 15:08:47 EDT 2002


John La Rooy wrote:
> Carl Banks wrote:
>> Gerhard H?ring wrote:
>> 
>>>>which means the real time is not spent in the compile() function, but
>>>>in the match or find function. So basically, couldn't one come up with
>>>>a *human readable* syntax for re, and compile that instead?
>>>
>>>That's equally powerful? Most probably not.
>> 
>> Why not?  It won't be as fast, but it should be able to do anything a
>> regexp can do, and would be much more versatile.
> 
> I think the main problem is that *human readable* doesn't map really 
> well onto regular expressions.

Ridiculous.  If you can map human readable code into machine language,
then you can map human readable code into regular expressions.

Cryptic as they are, regular expressions are still systematic; thus it
is possible to systematically convert the cryptic regexp syntax into
more readable and consistent syntax.


> What would the equivalent of r"(.)(.)(.)\3\2\1"
> This means a "palindrome of 6 characters"
> But it is unlikely that the human readable processor would understand 
> that (isn't it??)

Nope.  You don't appear to appreciate the power of computers to
translate human readable text into comlicated internal data, and are
evidently forgetting that interpretters such as Python that do a much
more difficult translation of readable text.


> It would be more likely to look like this (I haven't put too much 
> thought into this)

No kidding.


> "anything,anything,anything,same_as_3rd,same_as_2nd,same_as_1st"
> or would you like to suggest something else?

How about:

pattern = Group(Any()) + Group(Any()) + Group(Any()) \
          + GroupRef(3) + GroupRef(2) + GroupRef(1)

There's no reason it has to be re.compile with a string.


[snip]
> 
> Sure there are some cases where the re is loaded with meta characters...

That's the idea, chief.  For a simple regexp like you gave above, it
would be overkill to use a human readable syntax.  And it would still
be overkill for many regexps more complicated that that.

But eventually, the regexps will become complicated enough that a more
human readable syntax is preferrable.  Not to mention that a human
readable syntax will be more versatile, when that is needed.


> hmmm
> OK is this about writing maintainable code or people not wanting to 
> learn all the ins and outs of re's?

Nope.  For me, this is about understanding that complicated regexps
could benefit from a more readable and consistent syntax, and that the
more consistent syntax could add a lot of power and versatility to
regexps.


-- 
CARL BANKS
http://www.aerojockey.com



More information about the Python-list mailing list