a splitting headache

David C Ullrich dullrich at sprynet.com
Wed Oct 21 15:46:08 EDT 2009


On Tue, 20 Oct 2009 15:22:55 -0700, Mensanator wrote:

> On Oct 20, 1:51 pm, David C Ullrich <dullr... at sprynet.com> wrote:
>> On Thu, 15 Oct 2009 18:18:09 -0700, Mensanator wrote:
>> > All I wanted to do is split a binary number into two lists, a list of
>> > blocks of consecutive ones and another list of blocks of consecutive
>> > zeroes.
>>
>> > But no, you can't do that.
>>
>> >>>> c = '0010000110'
>> >>>> c.split('0')
>> > ['', '', '1', '', '', '', '11', '']
>>
>> > Ok, the consecutive delimiters appear as empty strings for reasons
>> > unknown (except for the first one). Except when they start or end the
>> > string in which case the first one is included.
>>
>> > Maybe there's a reason for this inconsistent behaviour but you won't
>> > find it in the documentation.
>>
>> Wanna bet? I'm not sure whether you're claiming that the behavior is
>> not specified in the docs or the reason for it. The behavior certainly
>> is specified. I conjecture you think the behavior itself is not
>> specified,
> 
> The problem is that the docs give a single example
> 
>>>> '1,,2'.split(',')
> ['1','','2']
> 
> ignoring the special case of leading/trailing delimiters. Yes, if you
> think it through, ',1,,2,'.split(',') should return ['','1','','2','']
> for exactly the reasons you give.
> 
> Trouble is, we often find ourselves doing ' 1  2  '.split() which
> returns
> ['1','2'].
> 
> I'm not saying either behaviour is wrong, it's just not obvious that the
> one behaviour doesn't follow from the other and the documentation could
> be
> a little clearer on this matter. It might make a bit more sense to
> actually
> mention the slpit(sep) behavior that split() doesn't do.

Have you _read_ the docs? They're quite clear on the difference
between no sep (or sep=None) and sep=something:

"If sep is given, consecutive delimiters are not grouped together and are 
deemed to delimit empty strings (for example, '1,,2'.split(',') returns 
['1', '', '2']). The sep argument may consist of multiple characters (for 
example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an 
empty string with a specified separator returns [''].

If sep is not specified or is None, a different splitting algorithm is 
applied: runs of consecutive whitespace are regarded as a single 
separator, and the result will contain no empty strings at the start or 
end if the string has leading or trailing whitespace. Consequently, 
splitting an empty string or a string consisting of just whitespace with 
a None separator returns []." 

> 
>> because your description of what's happening,
>>
>> "consecutive delimiters appear as empty strings for reasons
>>
>> > unknown (except for the first one). Except when they start or end the
>> > string in which case the first one is included"
>>
>> is at best an awkward way to look at it. The delimiters are not
>> appearing as empty strings.
>>
>> You're asking to split  '0010000110' on '0'. So you're asking for
>> strings a, b, c, etc such that
>>
>> (*) '0010000110' = a + '0' + b + '0' + c + '0' + etc
>>
>> The sequence of strings you're getting as output satisfies (*) exactly;
>> the first '' is what appears before the first delimiter, the second ''
>> is what's between the first and second delimiters, etc.




More information about the Python-list mailing list