[Python-ideas] New pattern-matching library (was: str.split with multiple individual split characters)

Guido van Rossum guido at python.org
Tue Mar 1 19:30:45 CET 2011


On Tue, Mar 1, 2011 at 9:05 AM, Mike Meyer <mwm at mired.org> wrote:
> On Tue, 1 Mar 2011 19:50:44 +1000
> Nick Coghlan <ncoghlan at gmail.com> wrote:
>
>> On Tue, Mar 1, 2011 at 9:19 AM, Mike Meyer <mwm at mired.org> wrote:
>> > I disagree. Fully general string pattern matching has a few
>> > fundamental operations: sequence, alternation, and repetition.
>>
>> I agree that the fundamental operations are simple in principle.
>>
>> However, I still believe that the elaboration of those operations into
>> fully general pattern matching is a complex combinatorial operation
>> that is difficult to master. regex's certainly make it harder than it
>> needs to be, but anything with similar expressive power is still going
>> to be tricky to completely wrap your head around.
>
> True. But I think that the problem - if properly expressed - is like
> the game of Go: a few simple rules that combine to produce a complex
> system that is difficult to master. With regexp notation, what we've
> got is more like 3d chess: multiple complex (just slightly different)
> sets of operations that do more to obscure the underlying simple rules
> than to help master the system.

I'm not sure those are the right analogies (though they may not be all
that wrong either). If you ask me there are two problems with regexps:

(a) The notation is cryptic and error-prone, its use of \ conflicts
with Python strings (using r'...' helps but is yet another gotcha),
and the parser is primitive. Until your brain has learned to parse
regexps, it will have a hard time understanding examples, which are
often the key to solving programming problems. Somehow the regexp
syntax is not "natural" for the text parsers we have in our brain --
contrast this with Python's syntax, which was explicitly designed to
go with the flow. Perhaps another problem is with composability -- if
you know how to solve two simple problems using regexps, that doesn't
mean your solutions can be combined to solve a combination of those
problems.

(b) There often isn't all that great of a match between the high-level
goals of the user (e.g. "extract a list of email addresses from a
file") and the available primitive operations. It's like writing an
operating system for a Turing machine -- we have mathematical proof
that it's possible, but that doesn't make it easy. The additional
operations provided by modern, Perl-derived (which includes Python's
re module) regexp notation are meant to help, but they just extend the
basic premises of regexp notation, rather than providing a new,
higher-level abstraction layer that is better matched to the way the
typical user thinks about the problem.

All in all I think it would be a good use of somebody's time to try
and come up with something better. But it won't be easy.

-- 
--Guido van Rossum (python.org/~guido)



More information about the Python-ideas mailing list