a splitting headache

John Posner jjposner at optimum.net
Fri Oct 16 17:30:10 CEST 2009

Mensenator said:
>>>> c = '0010000110'
>>>> c.split('0')
> ['', '', '1', '', '', '', '11', '']
> Ok, the consecutive delimiters appear as empty strings for
> reasons unknown (except for the first one). Except when they
> start or end the string in which case the first one is included.
> Maybe there's a reason for this inconsistent behaviour but you
> won't find it in the documentation.

The "reason unknown" is that split() is designed to handle *substrings 
separated by delimiters*, not *consecutive character runs*.  For 
example, TAB-separated (or if your prefer, COMMA-separated) strings.

In English:


If you split the above string on the <TAB> character, you really do want 
to get an empty string among the result substrings, indicating that 
"column 3" is empty.

In Python:

 >>> line = "one\ttwo\t\tfour"
 >>> line.split('\t')
['one', 'two', '', 'four']

A result of ['one', 'two', 'four'] would be misleading, no?


More information about the Python-list mailing list