a splitting headache

John Posner jjposner at optimum.net
Fri Oct 16 11:30:10 EDT 2009


Mensenator said:
>>>> c = '0010000110'
>>>> c.split('0')
>>>>         
> ['', '', '1', '', '', '', '11', '']
>
> Ok, the consecutive delimiters appear as empty strings for
> reasons unknown (except for the first one). Except when they
> start or end the string in which case the first one is included.
>
> Maybe there's a reason for this inconsistent behaviour but you
> won't find it in the documentation.
>   

The "reason unknown" is that split() is designed to handle *substrings 
separated by delimiters*, not *consecutive character runs*.  For 
example, TAB-separated (or if your prefer, COMMA-separated) strings.

In English:

  one<TAB>two<TAB><TAB>four

If you split the above string on the <TAB> character, you really do want 
to get an empty string among the result substrings, indicating that 
"column 3" is empty.

In Python:

 >>> line = "one\ttwo\t\tfour"
 >>> line.split('\t')
['one', 'two', '', 'four']

A result of ['one', 'two', 'four'] would be misleading, no?

-John




More information about the Python-list mailing list