[Python-ideas] str.startswith taking any iterator instead of just tuple

Fri Jan 3 00:24:00 CET 2014

The current behavior is intentional, and the ambiguity of strings
themselves being iterables is the main reason. Since startswith() is
almost always called with a literal or tuple of literals anyway, I see
little need to extend the semantics. (I notice that you don't actually
give any examples where the iterator would be useful -- have you
encountered any, or are you just arguing for consistency's sake?)

On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james at dontusethiscode.com> wrote:
> Some functions and methods allow the provision of a tuple of arguments
> which will be looped over internally. e.g.,
>
>     'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z'
>     isinstance(42, (float, int))
>
> In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to
> perform this internal iteration.
>
> As a result, the following are considered invalid:
>
>     'spam'.startswith(['s', 'z'])
>     'spam'.startswith({'s', 'z'})
>     'spam'.startswith(x for x in 'sz')
>
>     Traceback (most recent call last):
>       File "<stdin>", line 1, in <module>
>     TypeError: startswith first arg must be str, unicode, or tuple
>
> There are two common workarounds:
>
>     'spam'.startswith(tuple({'s', 'z'}))
>     any('spam'.startwith(c) for c in {'s', 'z'})
>
> Of course, the following construction already has a clear, separate meaning:
>
>    'spam'.startswith('sz') # 'spam' starts with 'sz'
>
> In these cases, could we supplant the PyTuple_Check with one that would
> allow any iterator? Alternatively, could add this as an additional branch?
>
> The code would look something like:
>
>     it = PyObject_GetIter(subobj);
>     if (it == NULL)
>         return NULL;
>
>     iternext = *Py_TYPE(it)->tp_iternext;
>     for(;;) {
>         substring = iternext(it);
>         if (substring == NULL)
>             Py_RETURN_FALSE;
>         result = tailmatch(self, substring, start, end, -1);
>         Py_DECREF(substring);
>         if (result)
>             Py_RETURN_TRUE;
>     }
>
> Of course, in the case of methods like .startswith, this would need to
> ensure the following behaviour remains unchanged. The following should
> always check if 'spam' starts with 'sz' not starts with 's' or with 'z':
>
>     'spam'.startswith('sz')
>
> I searched bugs.python.org and python-ideas for any previous discussion
> of this topic. If this seems reasonable, I can submit an enhancement to
> bugs.python.org with a patch for unicodeobject.c:unicode_startswith
>
> Cheers,
> James Powell
>
> follow: @dontusethiscode + @nycpython
> attend: nycpython.org + flask-nyc.org
> read: seriously.dontusethiscode.com
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

-- 
--Guido van Rossum (python.org/~guido)