[Python-Dev] split('') revisited

Tim Peters tim.one@comcast.net
Sat, 03 Aug 2002 23:32:30 -0400


> It's the last line in the loop body that makes empty matches
> a wart if allowed:  they wouldn't advance the position at all, and an
> infinite loop would result.  In order to make them do what you think you
> want, we'd have to add, at the end of the loop body
>    ah, and if the match was emtpy, advance the position again, by, oh,
>    i don't know, how about 1?  That's close to 0 <wink>.

[Andrew Koenig]
> Indeed, that's an arbitrary rule -- just about as arbitrary as the one
> that you abbreviated above, which should really be
> 	    find the next match, but if the match is empty, disregard it;
> 	    instead, find the next match with a length of at least,
> 	    oh, I don't know, how about 1?  That's close to 0 <wink>.

You really think so?  I expect almost all programmers would understand what
"find next non-empty match" means at first glance -- and especially
regexp-slingers, who are often burned in their matching lives by the
consequences of having large pieces of their patterns unexpectedly match an
empty string.  That makes "non-empty match" seem a natural concept to me.

> What I'm trying to do is come up with a useful example to convince
> myself that one is better than the other.

Have you found one yet?  I confess that re.findall() implements a "if the
match was empty, advance the position by 1" rule, as in

>>> re.findall("x?", "abc")
['', '', '', '']

But I don't think we're doing anyone a favor with stuff like that.  I think
it's a dubious idea that

>>> "abc".find('')

"works" too.  If a program does s1.find(s2) and s2 is an empty string, I
expect the chances are good it's a logic error in the program.  Analogies
to, e.g., i+j when j happens to be 0 leave me cold, since I can think of a
thousand reasons for why j might naturally be 0.  But I've had a hard time
thinking of a reasonable algorithm where the expression s1.find(s2) could be
expected to have s2=="" in normal operation (and am sure it would have been
a logic error elsewhere in any uses of string.find() I've made; ditto
searching for, or splitting on, empty strings via regexps).