[New-bugs-announce] [issue17668] re.split loses characters matching ungrouped parts of a pattern
Tomasz J. Kotarba
report at bugs.python.org
Mon Apr 8 20:18:58 CEST 2013
New submission from Tomasz J. Kotarba:
Tested in 2.7 but possibly affects the other versions as well.
A real life example (note the first character '>' being lost):
>>> import re
>>> re.split(r'^>(.*)$', '>Homo sapiens catenin (cadherin-associated)')
produces:
['', 'Homo sapiens catenin (cadherin-associated)', '']
Expected (and IMHO most useful) behaviour would be for it to return:
['', '>Homo sapiens catenin (cadherin-associated)', '']
or (IMHO much less useful as one can already get this one just by adding external grouping parentheses and it is ):
['', '>Homo sapiens catenin (cadherin-associated)', 'Homo sapiens catenin (cadherin-associated)', '']
Not sure whether it can be changed in such a mature and widely used module without breaking compatibility but just adding a new optional parameter for deciding how re.split() deals with patterns containing grouping parentheses and making it default to the current behaviour would be very helpful.
Best Regards
----------
components: Regular Expressions
messages: 186324
nosy: ezio.melotti, mrabarnett, triquetra011
priority: normal
severity: normal
status: open
title: re.split loses characters matching ungrouped parts of a pattern
type: behavior
versions: Python 2.7
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17668>
_______________________________________
More information about the New-bugs-announce
mailing list