[docs] [issue29291] Misleading text in the documentation of re library for non-greedy match

ipolcak report at bugs.python.org
Tue Jan 17 03:04:31 EST 2017


New submission from ipolcak:

The text about non-greedy match in the documentation for re library is misleading.

The docs for py2.7 (https://docs.python.org/2.7/library/re.html) and 3.6 (https://docs.python.org/3.6/library/re.html) says: "The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against <a> b <c>, it will match the entire string, and not just <a>. Adding ? after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using the RE <.*?> will match only <a>."

The docs for py3.4 (https://docs.python.org/3.4/library/re.html) offers a little bit different example:
"The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'."

However, in reality if the non-greedy match is not successful, it might fallback to the greedy match, see:

>>> import re
>>> a = re.compile(r"<.*?><span>")
>>> a.match("<a> b <c><span>")
<_sre.SRE_Match object; span=(0, 15), match='<a> b <c><span>'>
>>> a.search("<a> b <c><span>")
<_sre.SRE_Match object; span=(0, 15), match='<a> b <c><span>'>

So the '<.*?>' part of the regex matches '<a> b <c>' in this example. I propose to add to the documentation the following text:

"However, note that even the non-greedy version can match additional text, for example consider the RE '(<.*>)<d>' to be matched against '<a> b <c><d>'. The match is successful and the unnamed group contains '<a> b <c>'."

----------
assignee: docs at python
components: Documentation
messages: 285619
nosy: docs at python, ipolcak
priority: normal
severity: normal
status: open
title: Misleading text in the documentation of re library for non-greedy match
type: behavior
versions: Python 2.7, Python 3.4, Python 3.6

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue29291>
_______________________________________


More information about the docs mailing list