[Python-ideas] str.split with multiple individual split characters
Steven D'Aprano
steve at pearwood.info
Mon Feb 28 12:15:21 CET 2011
Carl M. Johnson wrote:
> Anyway, you'll get no argument from me: Regexes are easy once you know
> regexes. For whatever reason though, I've never been able to
> successfully, permanently learn regexes. I'm just trying to make the
> case that it's tough for some users to have to learn a whole separate
> language in order to do a certain kind of string split more simply.
I would say, *easy* regexes are easy once you know regexes. But in
general, not so much... even Larry Wall is rethinking a lot of regex
culture and syntax:
http://dev.perl.org/perl6/doc/design/apo/A05.html
But this case is relatively easy, although there is at least one obvious
trap for the unwary: forgetting to escape the split chars.
> Then again that's not to say that there needs to be such
> functionality. After all, love them or hate them, there are a lot of
> tasks for which regexes are just the simplest way to get the job done.
> It's just that users like me (if there are any) who find regexes hard
> to get to stick would appreciate being able to avoid learning them for
> a little longer.
I can sympathise with that. Regexes are essentially another programming
language (albeit not Turing Complete), and everything we love about
Python, regexes are the opposite. They're as far from executable
pseudo-code as it's possible to get without becoming one of those
esoteric languages that have three commands and one data type... *wink*
Anyway, for what it's worth, when I think about the times I've needed
something like a multi-split, it has been for mini-parsers. I think a
cross between split and partition would be more useful:
multisplit(source, seps, maxsplit=None)
=> [(substring, sep), ...]
Here's a pure-Python implementation, limited to single character separators:
def multisplit(source, seps, maxsplit=None):
def find_first():
for i, c in enumerate(source):
if c in seps:
return i
return -1
count = 0
while True:
if maxsplit is not None and count >= maxsplit:
yield (source, '')
break
p = find_first()
if p >= 0:
yield (source[:p], source[p])
count += 1
source = source[p+1:]
else:
yield (source, '')
break
--
Steven
More information about the Python-ideas
mailing list