[Python-ideas] str.split with multiple individual split characters

Steven D'Aprano steve at pearwood.info
Mon Feb 28 12:15:21 CET 2011


Carl M. Johnson wrote:

> Anyway, you'll get no argument from me: Regexes are easy once you know
> regexes. For whatever reason though, I've never been able to
> successfully, permanently learn regexes. I'm just trying to make the
> case that it's tough for some users to have to learn a whole separate
> language in order to do a certain kind of string split more simply.

I would say, *easy* regexes are easy once you know regexes. But in 
general, not so much... even Larry Wall is rethinking a lot of regex 
culture and syntax:

http://dev.perl.org/perl6/doc/design/apo/A05.html

But this case is relatively easy, although there is at least one obvious 
trap for the unwary: forgetting to escape the split chars.


> Then again that's not to say that there needs to be such
> functionality. After all, love them or hate them, there are a lot of
> tasks for which regexes are just the simplest way to get the job done.
> It's just that users like me (if there are any) who find regexes hard
> to get to stick would appreciate being able to avoid learning them for
> a little longer.

I can sympathise with that. Regexes are essentially another programming 
language (albeit not Turing Complete), and everything we love about 
Python, regexes are the opposite. They're as far from executable 
pseudo-code as it's possible to get without becoming one of those 
esoteric languages that have three commands and one data type... *wink*

Anyway, for what it's worth, when I think about the times I've needed 
something like a multi-split, it has been for mini-parsers. I think a 
cross between split and partition would be more useful:

multisplit(source, seps, maxsplit=None)
=> [(substring, sep), ...]


Here's a pure-Python implementation, limited to single character separators:


def multisplit(source, seps, maxsplit=None):
     def find_first():
         for i, c in enumerate(source):
             if c in seps:
                 return i
         return -1
     count = 0
     while True:
         if maxsplit is not None and count >= maxsplit:
             yield (source, '')
             break
         p = find_first()
         if p >= 0:
             yield (source[:p], source[p])
             count += 1
             source = source[p+1:]
         else:
             yield (source, '')
             break




-- 
Steven



More information about the Python-ideas mailing list