a splitting headache

Mensanator mensanator at aol.com
Thu Oct 22 09:07:17 EDT 2009


On Oct 22, 7:47�am, David C. Ullrich <dullr... at sprynet.com> wrote:
> On Wed, 21 Oct 2009 14:43:48 -0700 (PDT), Mensanator
>
>
>
>
>
> <mensana... at aol.com> wrote:
> >On Oct 21, 2:46�pm, David C Ullrich <dullr... at sprynet.com> wrote:
> >> On Tue, 20 Oct 2009 15:22:55 -0700, Mensanator wrote:
> >> > On Oct 20, 1:51�pm, David C Ullrich <dullr... at sprynet.com> wrote:
> >> >> On Thu, 15 Oct 2009 18:18:09 -0700, Mensanator wrote:
> >> >> > All I wanted to do is split a binary number into two lists, a list of
> >> >> > blocks of consecutive ones and another list of blocks of consecutive
> >> >> > zeroes.
>
> >> >> > But no, you can't do that.
>
> >> >> >>>> c = '0010000110'
> >> >> >>>> c.split('0')
> >> >> > ['', '', '1', '', '', '', '11', '']
>
> >> >> > Ok, the consecutive delimiters appear as empty strings for reasons
> >> >> > unknown (except for the first one). Except when they start or end the
> >> >> > string in which case the first one is included.
>
> >> >> > Maybe there's a reason for this inconsistent behaviour but you won't
> >> >> > find it in the documentation.
>
> >> >> Wanna bet? I'm not sure whether you're claiming that the behavior is
> >> >> not specified in the docs or the reason for it. The behavior certainly
> >> >> is specified. I conjecture you think the behavior itself is not
> >> >> specified,
>
> >> > The problem is that the docs give a single example
>
> >> >>>> '1,,2'.split(',')
> >> > ['1','','2']
>
> >> > ignoring the special case of leading/trailing delimiters. Yes, if you
> >> > think it through, ',1,,2,'.split(',') should return ['','1','','2','']
> >> > for exactly the reasons you give.
>
> >> > Trouble is, we often find ourselves doing ' 1 �2 �'.split() which
> >> > returns
> >> > ['1','2'].
>
> >> > I'm not saying either behaviour is wrong, it's just not obvious that the
> >> > one behaviour doesn't follow from the other and the documentation could
> >> > be
> >> > a little clearer on this matter. It might make a bit more sense to
> >> > actually
> >> > mention the slpit(sep) behavior that split() doesn't do.
>
> >> Have you _read_ the docs?
>
> >Yes.
>
> >> They're quite clear on the difference
> >> between no sep (or sep=None) and sep=something:
>
> >I disagree that they are "quite clear". The first paragraph makes no
> >mention of leading or trailing delimiters and they show no example
> >of such usage. An example would at least force me to think about it
> >if it isn't specifically mentioned in the paragraph.
>
> >One could infer from the second paragraph that, as it doesn't return
> >empty stings from leading and trailing whitespace, slpit(sep) does
> >for leading/trailing delimiters. Of course, why would I even be
> >reading
> >this paragraph when I'm trying to understand split(sep)?
>
> Now there you have an excellent point.
>
> At the start of the documentation for every function and method
> they should include the following:
>
> Note: If you want to understand completely how this
> function works you may need to read the entire documentation.

When I took Calculus, I wasn't required to read the
entire book before doing the chapter 1 homework.
Has teaching changed since I was ib school?

>
> And of course they should precede that in every instance with
>
> Note: Read the next sentence.

And don't forget to add:

We can't be bothered to show any examples of how
this actually works, work out all the special
cases for yourself.

>
>
>
>
>
> >The splitting of real strings is just as important, if not more so,
> >than the behaviour of splitting empty strings. Especially when the
> >behaviour is radically different.
>
> >>>> '010000110'.split('0')
> >['', '1', '', '', '', '11', '']
>
> >is a perfect example. It shows the empty strings generated from the
> >leading and trailing delimiters, and also that you get 3 empty
> >strings
> >between the '1's, not 4. When creating documentation, it is always a
> >good idea to document such cases.
>
> >And you'll then want to compare this to the equivalent whitespace
> >case:
> >>>> ' 1 � �11 '.split()
> >['1', '11']
>
> >And it wouldn't hurt to point this out:
> >>>> c = '010000110'.split('0')
> >>>> '0'.join(c)
> >'010000110'
>
> >and note that it won't work with the whitespace version.
>
> >No, I have not submitted a request to change the documentation, I was
> >looking for some feedback here. And it seems that no one else
> >considers
> >the documentation wanting.
>
> >> "If sep is given, consecutive delimiters are not grouped together and are
> >> deemed to delimit empty strings (for example, '1,,2'.split(',') returns
> >> ['1', '', '2']). The sep argument may consist of multiple characters (for
> >> example, '1<>2<>3'.split('<>') returns ['1', '2', '3']). Splitting an
> >> empty string with a specified separator returns [''].
>
> >> If sep is not specified or is None, a different splitting algorithm is
> >> applied: runs of consecutive whitespace are regarded as a single
> >> separator, and the result will contain no empty strings at the start or
> >> end if the string has leading or trailing whitespace. Consequently,
> >> splitting an empty string or a string consisting of just whitespace with
> >> a None separator returns []."
>
> >> >> because your description of what's happening,
>
> >> >> "consecutive delimiters appear as empty strings for reasons
>
> >> >> > unknown (except for the first one). Except when they start or end the
> >> >> > string in which case the first one is included"
>
> >> >> is at best an awkward way to look at it. The delimiters are not
> >> >> appearing as empty strings.
>
> >> >> You're asking to split �'0010000110' on '0'. So you're asking for
> >> >> strings a, b, c, etc such that
>
> >> >> (*) '0010000110' = a + '0' + b + '0' + c + '0' + etc
>
> >> >> The sequence of strings you're getting as output satisfies (*) exactly;
> >> >> the first '' is what appears before the first delimiter, the second ''
> >> >> is what's between the first and second delimiters, etc.
>
> David C. Ullrich
>
> "Understanding Godel isn't about following his formal proof.
> That would make a mockery of everything Godel was up to."
> (John Jones, "My talk about Godel to the post-grads."
> in sci.logic.)- Hide quoted text -
>
> - Show quoted text -- Hide quoted text -
>
> - Show quoted text -




More information about the Python-list mailing list