[Python-ideas] str.split with multiple individual split characters
Stefan Behnel
stefan_ml at behnel.de
Mon Feb 28 11:57:36 CET 2011
Steven D'Aprano, 28.02.2011 11:23:
> Guido van Rossum wrote:
>> It's so easy to do this using re.split() that it's not worth the added
>> complexity in str.split().
>
> Easy, but slow. If performance is important, it looks to me like re.split
> is the wrong solution. Using Python 3.1:
>
>
> >>> from re import split
> >>> def split_str(s, *args): # quick, dirty and inefficient multi-split
> ... for a in args[1:]:
> ... s = s.replace(a, args[0])
> ... return s.split(args[0])
> ...
> >>> text = "abc.d-ef_g:h;ijklmn+opqrstu|vw-x_y.z"*1000
> >>> assert split(r'[.\-_:;+|]', text) == split_str(text, *'.-_:;+|')
> >>>
> >>> from timeit import Timer
> >>> t1 = Timer("split(r'[.\-_:;+|]', text)",
> ... "from re import split; from __main__ import text")
> >>> t2 = Timer("split_str(text, *'.-_:;+|')",
> ... "from __main__ import split_str, text")
> >>>
> >>> min(t1.repeat(number=10000, repeat=5))
> 72.31230521202087
> >>> min(t2.repeat(number=10000, repeat=5))
> 17.375113010406494
You forgot to do the precompilation. Here's what I get:
>>> t1 = Timer("split(text)", "import re; from __main__ import text; \
... split=re.compile(r'[.\-_:;+|]').split")
>>> min(t1.repeat(number=1000, repeat=3))
3.9842870235443115
>>> min(t2.repeat(number=1000, repeat=3))
0.9261999130249023
Still a factor of 4, using Py3.2. Anyone wants to try it with the
alternative regex packages?
Stefan
More information about the Python-ideas
mailing list