[Python-Dev] Zero-width matching in regexes
storchaka at gmail.com
Wed Dec 13 10:26:11 EST 2017
05.12.17 01:21, MRAB пише:
> I've finally come to a conclusion as to what the "correct" behaviour of
> zero-width matches should be: """always return the first match, but
> never a zero-width match that is joined to a previous zero-width match""".
> If it's about to return a zero-width match that's joined to a previous
> zero-width match, then backtrack and keep on looking for a match.
> >>> print([m.span() for m in re.finditer(r'|.', 'a')])
> [(0, 0), (0, 1), (1, 1)]
> re.findall, re.split and re.sub should work accordingly.
> If re.finditer finds n matches, then re.split should return a list of
> n+1 strings and re.sub should make n replacements (excepting maxsplit,
We now have a good opportunity of changing a long standing behavior of
re.sub(). Currently empty matches are prohibited if adjacent to a
previous match. For consistency with re.finditer() and re.findall(),
with regex.sub() with VERSION1 flag, and with Perl, PCRE and other
engines they should be prohibited only if adjacent to a previous *empty*
match. Currently re.sub('x*', '-', 'abxc') returns '-a-b-c-', but will
return '-a-b--c-' if change the behavior.
This behavior already was unintentionally temporary changed between 2.1
and 2.2, when the underlying implementation of re was changed from PCRE
to SRE. But the former behavior was quickly restored (see
https://bugs.python.org/issue462270). Ironically the behavior of the
current PCRE is different.
1. Change the behavior right now.
2. Start emitting a FutureWarning and change the behavior in future version.
3. Keep the status quo forever.
We need to make a decision right now since in the first two cases we
should to change the behavior of re.split() right now. Its behavior is
changed in 3.7 in any case, and it is better to change the behavior once
than break the behavior in two different releases.
The changed detail is so subtle that no regular expressions in the
stdlib and tests are affected, except the special purposed test added
for guarding the current behavior.
More information about the Python-Dev