[New-bugs-announce] [issue36397] re.split() incorrectly splitting on zero-width pattern

Elias Tarhini report at bugs.python.org
Thu Mar 21 22:48:42 EDT 2019


New submission from Elias Tarhini <eltrhn at gmail.com>:

I believe I've found a bug in the `re` module -- specifically, in the 3.7+ support for splitting on zero-width patterns. Compare Java's behavior...

    jshell> "1211".split("(?<=(\\d))(?!\\1)(?=\\d)");
    $1 ==> String[3] { "1", "2", "11" }

...with Python's:

    >>> re.split(r'(?<=(\d))(?!\1)(?=\d)', '1211')
    ['1', '1', '2', '2', '11']

(The pattern itself is pretty straightforward in design, but regex syntax can cloud things, so to be totally clear: it finds any point that follows a digit and precedes a *different* digit.)

* Tested on 3.7.1 win10 and 3.7.0 linux.

----------
components: Regular Expressions
messages: 338581
nosy: Elias Tarhini, ezio.melotti, mrabarnett
priority: normal
severity: normal
status: open
title: re.split() incorrectly splitting on zero-width pattern
type: behavior
versions: Python 3.7

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue36397>
_______________________________________


More information about the New-bugs-announce mailing list