[Python-Dev] pre-PEP [corrected]: Complete, Structured Regular
Expression Group Matching
Stephen J. Turnbull
stephen at xemacs.org
Mon Aug 9 09:19:31 CEST 2004
>>>>> "Mike" == Mike Coleman <mkc at mathdogs.com> writes:
Mike> Motivation
Mike> ==========
Mike> A notable limitation of the ``re.match`` method is that it
Mike> fails to capture all group match information for repeatedly
Mike> matched groups. For example, in a call like this ::
Mike> m0 = re.match(r'([A-Z]+|[a-z]+)*', 'XxxxYzz')
Mike> one would like to see that the group which matched four
Mike> times matched the strings ``'X'``, ``'xxx'``, ``'Y'`` and
Mike> ``'zz'``.
Sure, but regexp syntax is a horrible way to express that. This
feature would be an attractive nuisance, IMHO. For example:
Mike> Parsing ``/etc/group``
Mike> ----------------------
Mike> If ``/etc/group`` contains the following lines ::
Mike> root:x:0:
Mike> daemon:x:1:
Mike> bin:x:2:
Mike> sys:x:3:
Mike> then it can be parsed as follows ::
Mike> p = r'((?:(?:^|:)([^:\n]*))*\n)*\Z'
This is a _easy_ one, but even it absolutely requires being written
with (?xm) and lots of comments, don't you think? If you're going to
be writing a multiline, verbose regular expression, why not write a
grammar instead, which (assuming a modicum of library support) will be
shorter and self-documenting?
--
Institute of Policy and Planning Sciences http://turnbull.sk.tsukuba.ac.jp
University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN
Ask not how you can "do" free software business;
ask what your business can "do for" free software.
More information about the Python-Dev
mailing list