On Tue, Mar 1, 2011 at 9:05 AM, Mike Meyer <mwm@mired.org> wrote:
On Tue, 1 Mar 2011 19:50:44 +1000 Nick Coghlan <ncoghlan@gmail.com> wrote:
On Tue, Mar 1, 2011 at 9:19 AM, Mike Meyer <mwm@mired.org> wrote:
I disagree. Fully general string pattern matching has a few fundamental operations: sequence, alternation, and repetition.
I agree that the fundamental operations are simple in principle.
However, I still believe that the elaboration of those operations into fully general pattern matching is a complex combinatorial operation that is difficult to master. regex's certainly make it harder than it needs to be, but anything with similar expressive power is still going to be tricky to completely wrap your head around.
True. But I think that the problem - if properly expressed - is like the game of Go: a few simple rules that combine to produce a complex system that is difficult to master. With regexp notation, what we've got is more like 3d chess: multiple complex (just slightly different) sets of operations that do more to obscure the underlying simple rules than to help master the system.
I'm not sure those are the right analogies (though they may not be all that wrong either). If you ask me there are two problems with regexps: (a) The notation is cryptic and error-prone, its use of \ conflicts with Python strings (using r'...' helps but is yet another gotcha), and the parser is primitive. Until your brain has learned to parse regexps, it will have a hard time understanding examples, which are often the key to solving programming problems. Somehow the regexp syntax is not "natural" for the text parsers we have in our brain -- contrast this with Python's syntax, which was explicitly designed to go with the flow. Perhaps another problem is with composability -- if you know how to solve two simple problems using regexps, that doesn't mean your solutions can be combined to solve a combination of those problems. (b) There often isn't all that great of a match between the high-level goals of the user (e.g. "extract a list of email addresses from a file") and the available primitive operations. It's like writing an operating system for a Turing machine -- we have mathematical proof that it's possible, but that doesn't make it easy. The additional operations provided by modern, Perl-derived (which includes Python's re module) regexp notation are meant to help, but they just extend the basic premises of regexp notation, rather than providing a new, higher-level abstraction layer that is better matched to the way the typical user thinks about the problem. All in all I think it would be a good use of somebody's time to try and come up with something better. But it won't be easy. -- --Guido van Rossum (python.org/~guido)