The split() function of Python's built-in module has changed in a puzzling way - is this a bug?
Thomas Jollans
tjol at tjol.eu
Fri Apr 23 03:52:41 EDT 2021
On 23/04/2021 01:53, Andy AO wrote:
> Upgrading from Python 3.6.8 to Python 3.9.0 and executing unit tests
> revealed a significant change in the behavior of re.split().
>
> but looking at the relevant documentation — Changelog <https://docs.
> python.org/3/whatsnew/changelog.html> and re - Regular expression
> operations - Python 3.9.4 documentation
> <https://docs.python.org/3/library/re.html?highlight=re%20search#re.split>
> yet no change is found.
>
> number = '123'def test_Asterisk_quantifier_with_capture_group(self):
> resultList = re.split(r'(\d*)', self.number)
> if platform.python_version() == '3.6.8':
> self.assertEqual(resultList,['', '123', ''])
>
> else:
> self.assertEqual(resultList,['', '123', '', '', ''])
Hi Andy,
That's interesting. The old result is less surprising, but of course
both are technically correct as the 4th element in the result matches
your regexp.
The oldest version of Python I had lying around to test is 3.7; that has
the same behaviour as 3.9.
I suspect that this behaviour is related to the following note in the
docs for re.split:
Changed in version 3.7: Added support of splitting on a pattern that
could match an empty string.
(your pattern can match an empty string, so I suppose it wasn't
technically supported in 3.6?)
-- Thomas
>
> I feel that this is clearly not in line with the description of the
> function in the split documentation, and it is also strange that after
> replacing * with +, the behavior is still the same as in 3.6.8.
>
> 1. why is this change not in the documentation? Is it because I didn’t
> find it?
> 2. Why did the behavior change this way? Was a bug introduced, or was it
> a bug fix?
More information about the Python-list
mailing list