[Python-Dev] split('') revisited
Tim Peters
tim.one@comcast.net
Sat, 03 Aug 2002 23:32:30 -0400
...
[Tim]
> It's the last line in the loop body that makes empty matches
> a wart if allowed: they wouldn't advance the position at all, and an
> infinite loop would result. In order to make them do what you think you
> want, we'd have to add, at the end of the loop body
>
> ah, and if the match was emtpy, advance the position again, by, oh,
> i don't know, how about 1? That's close to 0 <wink>.
[Andrew Koenig]
> Indeed, that's an arbitrary rule -- just about as arbitrary as the one
> that you abbreviated above, which should really be
>
> find the next match, but if the match is empty, disregard it;
> instead, find the next match with a length of at least,
> oh, I don't know, how about 1? That's close to 0 <wink>.
You really think so? I expect almost all programmers would understand what
"find next non-empty match" means at first glance -- and especially
regexp-slingers, who are often burned in their matching lives by the
consequences of having large pieces of their patterns unexpectedly match an
empty string. That makes "non-empty match" seem a natural concept to me.
> What I'm trying to do is come up with a useful example to convince
> myself that one is better than the other.
Have you found one yet? I confess that re.findall() implements a "if the
match was empty, advance the position by 1" rule, as in
>>> re.findall("x?", "abc")
['', '', '', '']
>>>
But I don't think we're doing anyone a favor with stuff like that. I think
it's a dubious idea that
>>> "abc".find('')
0
>>>
"works" too. If a program does s1.find(s2) and s2 is an empty string, I
expect the chances are good it's a logic error in the program. Analogies
to, e.g., i+j when j happens to be 0 leave me cold, since I can think of a
thousand reasons for why j might naturally be 0. But I've had a hard time
thinking of a reasonable algorithm where the expression s1.find(s2) could be
expected to have s2=="" in normal operation (and am sure it would have been
a logic error elsewhere in any uses of string.find() I've made; ditto
searching for, or splitting on, empty strings via regexps).