[Python-ideas] str.startswith taking any iterator instead of just tuple

Fri Jan 3 00:33:59 CET 2014

I could see expanding to allow lists/sets as well as tuples being useful,
e.g. for using dynamically generated prefix lists without creating
additional tuple objects, but I don't see arbitrary iteration being
necessary.

On Thu Jan 02 2014 at 3:25:20 PM, Guido van Rossum <guido at python.org> wrote:

> The current behavior is intentional, and the ambiguity of strings
> themselves being iterables is the main reason. Since startswith() is
> almost always called with a literal or tuple of literals anyway, I see
> little need to extend the semantics. (I notice that you don't actually
> give any examples where the iterator would be useful -- have you
> encountered any, or are you just arguing for consistency's sake?)
>
> On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james at dontusethiscode.com>
> wrote:
> > Some functions and methods allow the provision of a tuple of arguments
> > which will be looped over internally. e.g.,
> >
> >     'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z'
> >     isinstance(42, (float, int))
> >
> > In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to
> > perform this internal iteration.
> >
> > As a result, the following are considered invalid:
> >
> >     'spam'.startswith(['s', 'z'])
> >     'spam'.startswith({'s', 'z'})
> >     'spam'.startswith(x for x in 'sz')
> >
> >     Traceback (most recent call last):
> >       File "<stdin>", line 1, in <module>
> >     TypeError: startswith first arg must be str, unicode, or tuple
> >
> > There are two common workarounds:
> >
> >     'spam'.startswith(tuple({'s', 'z'}))
> >     any('spam'.startwith(c) for c in {'s', 'z'})
> >
> > Of course, the following construction already has a clear, separate
> meaning:
> >
> >    'spam'.startswith('sz') # 'spam' starts with 'sz'
> >
> > In these cases, could we supplant the PyTuple_Check with one that would
> > allow any iterator? Alternatively, could add this as an additional
> branch?
> >
> > The code would look something like:
> >
> >     it = PyObject_GetIter(subobj);
> >     if (it == NULL)
> >         return NULL;
> >
> >     iternext = *Py_TYPE(it)->tp_iternext;
> >     for(;;) {
> >         substring = iternext(it);
> >         if (substring == NULL)
> >             Py_RETURN_FALSE;
> >         result = tailmatch(self, substring, start, end, -1);
> >         Py_DECREF(substring);
> >         if (result)
> >             Py_RETURN_TRUE;
> >     }
> >
> > Of course, in the case of methods like .startswith, this would need to
> > ensure the following behaviour remains unchanged. The following should
> > always check if 'spam' starts with 'sz' not starts with 's' or with 'z':
> >
> >     'spam'.startswith('sz')
> >
> > I searched bugs.python.org and python-ideas for any previous discussion
> > of this topic. If this seems reasonable, I can submit an enhancement to
> > bugs.python.org with a patch for unicodeobject.c:unicode_startswith
> >
> > Cheers,
> > James Powell
> >
> > follow: @dontusethiscode + @nycpython
> > attend: nycpython.org + flask-nyc.org
> > read: seriously.dontusethiscode.com
> >
> > _______________________________________________
> > Python-ideas mailing list
> > Python-ideas at python.org
> > https://mail.python.org/mailman/listinfo/python-ideas
> > Code of Conduct: http://python.org/psf/codeofconduct/
>
>
>
> --
> --Guido van Rossum (python.org/~guido)
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20140102/534a45e8/attachment.html>