Inconsistent behaviour os str.find/str.index when providing optional parameters
hansmu at xs4all.nl
Wed Nov 21 20:25:09 CET 2012
On 21/11/12 17:59:05, Alister wrote:
> On Wed, 21 Nov 2012 04:43:57 -0800, Giacomo Alzetta wrote:
>> I just came across this:
>>>>> 'spam'.find('', 5)
>> Now, reading find's documentation:
>> S.find(sub [,start [,end]]) -> int
>> Return the lowest index in S where substring sub is found,
>> such that sub is contained within S[start:end]. Optional arguments
>> start and end are interpreted as in slice notation.
>> Return -1 on failure.
>> Now, the empty string is a substring of every string so how can find
>> find, from the doc, should be generally be equivalent to
>> S[start:end].find(substring) + start, except if the substring is not
>> found but since the empty string is a substring of the empty string it
>> should never fail.
>> Looking at the source code for find(in stringlib/find.h):
>> stringlib_find(const STRINGLIB_CHAR* str, Py_ssize_t str_len,
>> const STRINGLIB_CHAR* sub, Py_ssize_t sub_len,
>> Py_ssize_t offset)
>> Py_ssize_t pos;
>> if (str_len < 0)
>> return -1;
>> I believe it should be:
>> if (str_len < 0)
>> return (sub_len == 0 ? 0 : -1);
>> Is there any reason of having this unexpected behaviour or was this
>> simply overlooked?
> why would you be searching for an empty string?
> what result would you expect to get from such a search?
In general, if
needle in haystack[ start: ]
return True, then you' expect
to return the smallest i >= start such that
haystack[i:i+len(needle)] == needle
also returns True.
>>> "" in "spam"[5:]
>>> "spam"[5:5+len("")] == ""
So, you'd expect that spam.find("", 5) would return 5.
The only other consistent position would be that "spam"[5:]
should raise an IndexError, because 5 is an invalid index.
For that matter, I wouldn;t mind if "spam".find(s, 5) were
to raise an IndexError. But if slicing at position 5
proudces an empry string, then .find should be able to
find that empty string.
More information about the Python-list