[New-bugs-announce] [issue46945] Quantifier and Expanded Regex Expression Gives Different Results

Vivian D report at bugs.python.org
Mon Mar 7 08:19:26 EST 2022


New submission from Vivian D <vmd3.14 at gmail.com>:

Here are the steps that I went through to test my regular expressions in my command prompt (a file attachment shows this as well). I am using Windows 11, version 21H2:

>>> import re
>>> regex = r"(((\w)+\w*\3){2}|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*"
>>> testString = "Alabama and Mississippi are next to each other"
>>> re.findall(regex,testString,re.IGNORECASE)
[('Mississipp', 'ipp', 'p', '', '')]
>>> testString = "alabama and Mississippi are next to each other"
>>> re.findall(regex,testString,re.IGNORECASE)
[('Mississipp', 'ipp', 'p', '', '')]
>>> regex = r"((\w)+\w*\2(\w)+\w*\3|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*"
>>> re.findall(regex,testString,re.IGNORECASE)
[('alabama', 'a', 'a', '', ''), ('Mississipp', 's', 'p', '', '')]
>>> testString = "Alabama and Mississippi are next to each other"
>>> re.findall(regex,testString,re.IGNORECASE)
[('Alabama', 'A', 'a', '', ''), ('Mississipp', 's', 'p', '', '')]

I created a regular expression to match any words with two sets of the same vowel, including words with four of the same vowel, ignoring case. My first regular expression “(((\w)+\w*\3){2}|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*" was able to match “Mississippi” but unable to match “Alabama” as it should have. To make sure that this error wasn’t somehow caused by a case sensitivity issue, I retested the regex with “alabama” instead of “Alabama”, but still I got no match on “alabama”. Then I tried replacing the quantifier {2} with just expression that was supposed to be repeated. This gave me the regex: "((\w)+\w*\2(\w)+\w*\3|(\w)+(?=\w*\4)\w*(?!\4)(\w)\w*\5)\w*". For some reason, this was able to match on both “alabama” and “Alabama” now, as shown above, and continued to match on Mississippi like expected. However, this result seems to contradict my understand of regular expressions because all I did to get these different results was copy the expression that was supposed to be executed twice by the quantifier.

----------
components: Library (Lib)
files: ComandPrompt.pdf
messages: 414668
nosy: vmd3.14
priority: normal
severity: normal
status: open
title: Quantifier and Expanded Regex Expression Gives Different Results
type: behavior
versions: Python 3.8
Added file: https://bugs.python.org/file50661/ComandPrompt.pdf

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue46945>
_______________________________________


More information about the New-bugs-announce mailing list