[Python-ideas] str.split with multiple individual split characters

Bruce Leban bruce at leapyear.org
Mon Feb 28 07:51:51 CET 2011


On Sun, Feb 27, 2011 at 10:19 PM, Stephen J. Turnbull <stephen at xemacs.org>wrote:

> def multisplit (source, char1, char2):
> ...  return re.split("".join(["[",char1,char2,"]"]),source)
>

actually you need re.escape there in case one of the characters is \ or ].
And if remembering [...] is hard using | makes this a bit more general
(accepting multi-character separators)

def multisplit(source, *separators):
return re.split('|'.join([re.escape(t) for t in separators]), source)

multisplit(s, '\r\n', '\r', '\n')


Bonus points if you see the problem with the above. Correct code below
spoiler space
.
.
.
.
.
.
.
.
.
.
.
The problem is that an |-separated regex matches in order, so if a longer
separator appears after a shorter one, the shorter one will take precedence.

def multisplit(source, *separators):
    return re.split('|'.join([re.escape(t) for t in
        sorted(separators, key=len, reverse=True)]), source)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20110227/649cae81/attachment.html>


More information about the Python-ideas mailing list