How to split() by multiple characters?

Alex Martelli aleax at aleax.it
Thu May 8 06:57:35 EDT 2003


Nick Forest wrote:

> Given an piece of text in a long string, how to split() it by ',' or '.'
> or ';' ...
> 
> i.e. text.split ( '[,.;]' )
> 
> Of course, the code doesn't work. :-(
> Is there any good way to do this?

Standard library module re is one way.  If you want to avoid re, you can do
it with string operations too, e.g.:


>>> text = 'fee,fie;foo.fum'
>>> import string
>>> tt = string.maketrans(',;', '..')
>>> text.translate(tt).split('.')
['fee', 'fie', 'foo', 'fum']

i.e.: translate all characters you want to use as splitters into just one
of them, then split by that one.

When you have TWO good alternatives, you may sometimes want to choose
between them on the basis of performance.  The timeit.py module in the
Python 2.3 standard library is great for that, e.g.:

[alex at lancelot Lib]$ python timeit.py -s'''
text="fee,fie;foo.fum"
import string
tt = string.maketrans(";,","..")
''' 'text.translate(tt).split(".")'
100000 loops, best of 3: 3.65 usec per loop

[alex at lancelot Lib]$ python timeit.py -s'''
text="fee,fie;foo.fum"
import re
punct = re.compile("[.;,]")
''' 'punct.split(text)'
100000 loops, best of 3: 5.83 usec per loop

Of course, you'll want to use a text value that is more representative
of the kinds of texts you DO often have to split, as well as run this
on the machines that matter to you and the Python versions you'll be
using "in production" (fortunately, although it's in the standard library
for 2.3, module timeit.py seems to run just fine with 2.2.2 as well!).


Personally, I nominate timeit.py as THE outstanding enhancement of
Python 2.3 -- I love many of the others (the performance improvement
in particular), but timeit.py is the one I find myself using all of
the time -- seems to provide an innocuous way for programmers' typical
obsession with performance and micro-optimization to discharge itself
harmlessly, cleansing their souls to choose idioms on the basis of
clarity, maintainability and readability, as of course SHOULD be done:-).


Alex





More information about the Python-list mailing list