[Python-ideas] str.split with multiple individual split characters
Steven D'Aprano
steve at pearwood.info
Mon Feb 28 11:23:38 CET 2011
Guido van Rossum wrote:
> It's so easy to do this using re.split() that it's not worth the added
> complexity in str.split().
Easy, but slow. If performance is important, it looks to me like
re.split is the wrong solution. Using Python 3.1:
>>> from re import split
>>> def split_str(s, *args): # quick, dirty and inefficient multi-split
... for a in args[1:]:
... s = s.replace(a, args[0])
... return s.split(args[0])
...
>>> text = "abc.d-ef_g:h;ijklmn+opqrstu|vw-x_y.z"*1000
>>> assert split(r'[.\-_:;+|]', text) == split_str(text, *'.-_:;+|')
>>>
>>> from timeit import Timer
>>> t1 = Timer("split(r'[.\-_:;+|]', text)",
... "from re import split; from __main__ import text")
>>> t2 = Timer("split_str(text, *'.-_:;+|')",
... "from __main__ import split_str, text")
>>>
>>> min(t1.repeat(number=10000, repeat=5))
72.31230521202087
>>> min(t2.repeat(number=10000, repeat=5))
17.375113010406494
--
Steven
More information about the Python-ideas
mailing list