a splitting headache
David C Ullrich
dullrich at sprynet.com
Wed Oct 21 15:46:08 EDT 2009
On Tue, 20 Oct 2009 15:22:55 -0700, Mensanator wrote:
> On Oct 20, 1:51 pm, David C Ullrich <dullr... at sprynet.com> wrote:
>> On Thu, 15 Oct 2009 18:18:09 -0700, Mensanator wrote:
>> > All I wanted to do is split a binary number into two lists, a list of
>> > blocks of consecutive ones and another list of blocks of consecutive
>> > zeroes.
>>
>> > But no, you can't do that.
>>
>> >>>> c = '0010000110'
>> >>>> c.split('0')
>> > ['', '', '1', '', '', '', '11', '']
>>
>> > Ok, the consecutive delimiters appear as empty strings for reasons
>> > unknown (except for the first one). Except when they start or end the
>> > string in which case the first one is included.
>>
>> > Maybe there's a reason for this inconsistent behaviour but you won't
>> > find it in the documentation.
>>
>> Wanna bet? I'm not sure whether you're claiming that the behavior is
>> not specified in the docs or the reason for it. The behavior certainly
>> is specified. I conjecture you think the behavior itself is not
>> specified,
>
> The problem is that the docs give a single example
>
>>>> '1,,2'.split(',')
> ['1','','2']
>
> ignoring the special case of leading/trailing delimiters. Yes, if you
> think it through, ',1,,2,'.split(',') should return ['','1','','2','']
> for exactly the reasons you give.
>
> Trouble is, we often find ourselves doing ' 1 2 '.split() which
> returns
> ['1','2'].
>
> I'm not saying either behaviour is wrong, it's just not obvious that the
> one behaviour doesn't follow from the other and the documentation could
> be
> a little clearer on this matter. It might make a bit more sense to
> actually
> mention the slpit(sep) behavior that split() doesn't do.
Have you _read_ the docs? They're quite clear on the difference
between no sep (or sep=None) and sep=something:
"If sep is given, consecutive delimiters are not grouped together and are
deemed to delimit empty strings (for example, '1,,2'.split(',') returns
['1', '', '2']). The sep argument may consist of multiple characters (for
example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an
empty string with a specified separator returns [''].
If sep is not specified or is None, a different splitting algorithm is
applied: runs of consecutive whitespace are regarded as a single
separator, and the result will contain no empty strings at the start or
end if the string has leading or trailing whitespace. Consequently,
splitting an empty string or a string consisting of just whitespace with
a None separator returns []."
>
>> because your description of what's happening,
>>
>> "consecutive delimiters appear as empty strings for reasons
>>
>> > unknown (except for the first one). Except when they start or end the
>> > string in which case the first one is included"
>>
>> is at best an awkward way to look at it. The delimiters are not
>> appearing as empty strings.
>>
>> You're asking to split '0010000110' on '0'. So you're asking for
>> strings a, b, c, etc such that
>>
>> (*) '0010000110' = a + '0' + b + '0' + c + '0' + etc
>>
>> The sequence of strings you're getting as output satisfies (*) exactly;
>> the first '' is what appears before the first delimiter, the second ''
>> is what's between the first and second delimiters, etc.
More information about the Python-list
mailing list