[Python-Dev] Zero-width matching in regexes

MRAB python at mrabarnett.plus.com
Mon Dec 4 18:21:09 EST 2017


I've finally come to a conclusion as to what the "correct" behaviour of 
zero-width matches should be: """always return the first match, but 
never a zero-width match that is joined to a previous zero-width match""".

If it's about to return a zero-width match that's joined to a previous 
zero-width match, then backtrack and keep on looking for a match.

Example:

 >>> print([m.span() for m in re.finditer(r'|.', 'a')])
[(0, 0), (0, 1), (1, 1)]

re.findall, re.split and re.sub should work accordingly.

If re.finditer finds n matches, then re.split should return a list of 
n+1 strings and re.sub should make n replacements (excepting maxsplit, 
etc.).


More information about the Python-Dev mailing list