a splitting headache
John Posner
jjposner at optimum.net
Fri Oct 16 11:30:10 EDT 2009
Mensenator said:
>>>> c = '0010000110'
>>>> c.split('0')
>>>>
> ['', '', '1', '', '', '', '11', '']
>
> Ok, the consecutive delimiters appear as empty strings for
> reasons unknown (except for the first one). Except when they
> start or end the string in which case the first one is included.
>
> Maybe there's a reason for this inconsistent behaviour but you
> won't find it in the documentation.
>
The "reason unknown" is that split() is designed to handle *substrings
separated by delimiters*, not *consecutive character runs*. For
example, TAB-separated (or if your prefer, COMMA-separated) strings.
In English:
one<TAB>two<TAB><TAB>four
If you split the above string on the <TAB> character, you really do want
to get an empty string among the result substrings, indicating that
"column 3" is empty.
In Python:
>>> line = "one\ttwo\t\tfour"
>>> line.split('\t')
['one', 'two', '', 'four']
A result of ['one', 'two', 'four'] would be misleading, no?
-John
More information about the Python-list
mailing list