a splitting headache

Mensanator mensanator at aol.com
Fri Oct 16 12:42:05 EDT 2009


On Oct 16, 10:30�am, John Posner <jjpos... at optimum.net> wrote:
> Mensenator said:
>
> >>>> c = '0010000110'
> >>>> c.split('0')
>
> > ['', '', '1', '', '', '', '11', '']
>
> > Ok, the consecutive delimiters appear as empty strings for
> > reasons unknown (except for the first one). Except when they
> > start or end the string in which case the first one is included.
>
> > Maybe there's a reason for this inconsistent behaviour but you
> > won't find it in the documentation.
>
> The "reason unknown" is that split() is designed to handle *substrings
> separated by delimiters*, not *consecutive character runs*. �For
> example, TAB-separated (or if your prefer, COMMA-separated) strings.
>
> In English:
>
> � one<TAB>two<TAB><TAB>four
>
> If you split the above string on the <TAB> character, you really do want
> to get an empty string among the result substrings, indicating that
> "column 3" is empty.
>
> In Python:
>
> �>>> line = "one\ttwo\t\tfour"
> �>>> line.split('\t')
> ['one', 'two', '', 'four']
>
> A result of ['one', 'two', 'four'] would be misleading, no?

I see, split is intended to behave like the csv
module.

Except when whitespace is involved, which is more
useful when the source is fixed length.

This isn't explained well in the manual. The manual
mentions that str.split() doesn't leave null strings
at the beginning or end of the list, yet these were
never mentioned previously. Because they used a simple
example like yours, rather than a more comprehensive
example or two, the reader is left to read between the
lines to figure this out.

And I hadn't considered putting it back together.
>>> c = '0010000110'
>>> cc = c.split('0')
>>> cc
['', '', '1', '', '', '', '11', '']
>>> '0'.join(cc)
'0010000110'
Because this wasn't applicable to what I wanted,
I wanted just ['1','11'].

And judging from the responses posted in this thread
(thanks to everyone who replied), it's not that hard
to get whitespace behavior from non-whitespace
delimiters.

>
> -John




More information about the Python-list mailing list