string.split

Wed Sep 5 06:04:33 EDT 2001

"Fredrik Lundh" <fredrik at pythonware.com> wrote in message
news:6kll7.895$sn6.205073 at newsc.telia.net...
> Tom Harris wrote:
> > How do I split on any or all occurrences of (for example)
> > whitespace and a comma, without using regexes.
>
> are you interested in results, or are you just trying to make
> python fit your mental model?
>
> if the former, this does what you want:
>
>     L = re.split("[\s,]+", S)
>
> or faster, in the current version:
>
>     L = re.findall("[^\s,]+", S)

Note that there's quite a difference between these two
approaches, of course:

>>> import re
>>> S="eenie,,meenie,,moe,"
>>> re.split("[\s,]+", S)
['eenie', 'meenie', 'moe', '']
>>> re.findall("[^\s,]+", S)
['eenie', 'meenie', 'moe']
>>>

Both hide the ",," occurrences between eenie and meenie
(which string.split(S,',') wouldn't), and the findall
approach also hides the trailing-comma -- this may or
may not be what you want in a given case, so, choose
advisedly.  The findall approach has particularly
interesting behavior when you change the + to a *:

>>> re.findall("[^\s,]*", S)
['eenie', '', '', 'meenie', '', '', 'moe', '', '']
>>> re.findall("[^\s,]*", "eenie,meenie,moe")
['eenie', '', 'meenie', '', 'moe', '']

while the split approach may be usable without either
the + or the *, again depending on your needs:

>>> re.split("[\s,]", S)
['eenie', '', 'meenie', '', 'moe', '']
>>> re.split("[\s,]", "eenie,meenie,moe"
['eenie', 'meenie', 'moe']

This best approximates string.split(S,","), although
of course it IS quite different from what string.split(S)
would do as the latter DOES suppress "splits" resulting
from leading and trailing whitespace, as well as treating
runs of whitespace just the same as a single whitespace.

You pays your money an' you takes your choice...!-)

Alex