[docs] [issue17668] re.split loses characters matching ungrouped parts of a pattern

Tue Apr 9 06:59:52 CEST 2013

Tomasz J. Kotarba added the comment:

Hi,
I can still see one piece of functionality I have mentioned missing. Using my first example, even when one uses '^(>(.*))$' one cannot get ['', '>Homo sapiens catenin (cadherin-associated)', ''] as one will get a four-element list and need to deal with the third element of the returned list (i.e. the match for a group).  Having a parameter I have described before which allows for getting the output similar to what one gets for groups but for the whole pattern (and only that) would be very convenient for some scenarios (like when writing a procedure which processes texts using different (and unknown at the time of writing the procedure) regex patterns which uses a variable number of groups but also the pattern as a whole (also for performing the split operation)).
Of course it can be worked around using many different approaches but still, as I said at start, I believe it would be useful (and would not break compatibility).  Another possible solution (i.e. different than the one I suggested at start) would be to have a parameter to tell re.split to ignore the groups (or, going even further, to select which groups to ignore).  Anyway, I am not the developer of this module so if you feel it would be too much of a bother to add such a parameter just for the sake of convenience then, by all means, please feel free to disregard my comments and just close this report.
Cheers,
T
P.S.  It is very late so I can only hope I have been sane enough to properly / clearly express my thoughts.  Apologies if not.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue17668>
_______________________________________