[Python-ideas] str.split with multiple individual split characters

Steven D'Aprano steve at pearwood.info
Mon Feb 28 11:23:38 CET 2011


Guido van Rossum wrote:
> It's so easy to do this using re.split() that it's not worth the added
> complexity in str.split().

Easy, but slow. If performance is important, it looks to me like 
re.split is the wrong solution. Using Python 3.1:


 >>> from re import split
 >>> def split_str(s, *args): # quick, dirty and inefficient multi-split
...     for a in args[1:]:
...             s = s.replace(a, args[0])
...     return s.split(args[0])
...
 >>> text = "abc.d-ef_g:h;ijklmn+opqrstu|vw-x_y.z"*1000
 >>> assert split(r'[.\-_:;+|]', text) == split_str(text, *'.-_:;+|')
 >>>
 >>> from timeit import Timer
 >>> t1 = Timer("split(r'[.\-_:;+|]', text)",
... "from re import split; from __main__ import text")
 >>> t2 = Timer("split_str(text, *'.-_:;+|')",
... "from __main__ import split_str, text")
 >>>
 >>> min(t1.repeat(number=10000, repeat=5))
72.31230521202087
 >>> min(t2.repeat(number=10000, repeat=5))
17.375113010406494



-- 
Steven




More information about the Python-ideas mailing list