[Python-Dev] pre-PEP [corrected]: Complete, Structured Regular Expression Group Matching

Stephen J. Turnbull stephen at xemacs.org
Mon Aug 9 09:19:31 CEST 2004


>>>>> "Mike" == Mike Coleman <mkc at mathdogs.com> writes:

    Mike> Motivation
    Mike> ==========

    Mike> A notable limitation of the ``re.match`` method is that it
    Mike> fails to capture all group match information for repeatedly
    Mike> matched groups.  For example, in a call like this ::

    Mike>     m0 = re.match(r'([A-Z]+|[a-z]+)*', 'XxxxYzz')

    Mike> one would like to see that the group which matched four
    Mike> times matched the strings ``'X'``, ``'xxx'``, ``'Y'`` and
    Mike> ``'zz'``.

Sure, but regexp syntax is a horrible way to express that.  This
feature would be an attractive nuisance, IMHO.  For example:

    Mike> Parsing ``/etc/group``
    Mike> ----------------------

    Mike> If ``/etc/group`` contains the following lines ::

    Mike>     root:x:0:
    Mike>     daemon:x:1:
    Mike>     bin:x:2:
    Mike>     sys:x:3:

    Mike> then it can be parsed as follows ::

    Mike>     p = r'((?:(?:^|:)([^:\n]*))*\n)*\Z'

This is a _easy_ one, but even it absolutely requires being written
with (?xm) and lots of comments, don't you think?  If you're going to
be writing a multiline, verbose regular expression, why not write a
grammar instead, which (assuming a modicum of library support) will be
shorter and self-documenting?


-- 
Institute of Policy and Planning Sciences     http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba                    Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
               Ask not how you can "do" free software business;
              ask what your business can "do for" free software.


More information about the Python-Dev mailing list