Possible re bug when using ".*"

Roel Schroeven roel at roelschroeven.net
Wed Dec 28 14:03:04 EST 2022


Roel Schroeven schreef op 28/12/2022 om 19:59:
> Alexander Richert - NOAA Affiliate via Python-list schreef op 
> 28/12/2022 om 19:42:
>>   In a couple recent versions of Python (including 3.8 and 3.10), the
>> following code:
>> import re
>> print(re.sub(".*", "replacement", "pattern"))
>> yields the output "replacementreplacement".
>>
>> This behavior does not occur in 3.6.
>>
>> Which behavior is the desired one? Perhaps relatedly, I noticed that even
>> in 3.6, the code
>> print(re.findall(".*","pattern"))
>> yields ['pattern',''] which is not what I was expecting.
> The documentation for re.sub() and re.findall() has these notes: 
> "Changed in version 3.7: Empty matches for the pattern are replaced 
> when adjacent to a previous non-empty match." and "Changed in version 
> 3.7: Non-empty matches can now start just after a previous empty match."
> That's probably describes the behavior you're seeing. ".*" first 
> matches "pattern", which is a non-empty match; then it matches the 
> empty string at the end, which is an empty match but is replaced 
> because it is adjacent to a non-empty match.
>
> Seems somewhat counter-intuitive to me, but AFAICS it's the intended 
> behavior.
For what it's worth, there's some discussion about this in this Github 
issue: https://github.com/python/cpython/issues/76489

-- 
"Je ne suis pas d’accord avec ce que vous dites, mais je me battrai jusqu’à
la mort pour que vous ayez le droit de le dire."
         -- Attribué à Voltaire
"I disapprove of what you say, but I will defend to the death your right to
say it."
         -- Attributed to Voltaire
"Ik ben het niet eens met wat je zegt, maar ik zal je recht om het te zeggen
tot de dood toe verdedigen"
         -- Toegeschreven aan Voltaire


More information about the Python-list mailing list