[Python-ideas] str.startswith taking any iterator instead of just tuple
Guido van Rossum
guido at python.org
Fri Jan 3 00:24:00 CET 2014
The current behavior is intentional, and the ambiguity of strings
themselves being iterables is the main reason. Since startswith() is
almost always called with a literal or tuple of literals anyway, I see
little need to extend the semantics. (I notice that you don't actually
give any examples where the iterator would be useful -- have you
encountered any, or are you just arguing for consistency's sake?)
On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james at dontusethiscode.com> wrote:
> Some functions and methods allow the provision of a tuple of arguments
> which will be looped over internally. e.g.,
>
> 'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z'
> isinstance(42, (float, int))
>
> In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to
> perform this internal iteration.
>
> As a result, the following are considered invalid:
>
> 'spam'.startswith(['s', 'z'])
> 'spam'.startswith({'s', 'z'})
> 'spam'.startswith(x for x in 'sz')
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> TypeError: startswith first arg must be str, unicode, or tuple
>
> There are two common workarounds:
>
> 'spam'.startswith(tuple({'s', 'z'}))
> any('spam'.startwith(c) for c in {'s', 'z'})
>
> Of course, the following construction already has a clear, separate meaning:
>
> 'spam'.startswith('sz') # 'spam' starts with 'sz'
>
> In these cases, could we supplant the PyTuple_Check with one that would
> allow any iterator? Alternatively, could add this as an additional branch?
>
> The code would look something like:
>
> it = PyObject_GetIter(subobj);
> if (it == NULL)
> return NULL;
>
> iternext = *Py_TYPE(it)->tp_iternext;
> for(;;) {
> substring = iternext(it);
> if (substring == NULL)
> Py_RETURN_FALSE;
> result = tailmatch(self, substring, start, end, -1);
> Py_DECREF(substring);
> if (result)
> Py_RETURN_TRUE;
> }
>
> Of course, in the case of methods like .startswith, this would need to
> ensure the following behaviour remains unchanged. The following should
> always check if 'spam' starts with 'sz' not starts with 's' or with 'z':
>
> 'spam'.startswith('sz')
>
> I searched bugs.python.org and python-ideas for any previous discussion
> of this topic. If this seems reasonable, I can submit an enhancement to
> bugs.python.org with a patch for unicodeobject.c:unicode_startswith
>
> Cheers,
> James Powell
>
> follow: @dontusethiscode + @nycpython
> attend: nycpython.org + flask-nyc.org
> read: seriously.dontusethiscode.com
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list