Mailman 3 str.startswith taking any iterator instead of just tuple - Python-ideas

newer
Strong password hashing algorithms...

str.startswith taking any iterator instead of just tuple

older
Getting file name of Path without...

James Powell

Jan. 2, 2014

3:29 p.m.

Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g., 'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int)) In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration. As a result, the following are considered invalid: 'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz') Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple There are two common workarounds: 'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'}) Of course, the following construction already has a clear, separate meaning: 'spam'.startswith('sz') # 'spam' starts with 'sz' In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch? The code would look something like: it = PyObject_GetIter(subobj); if (it == NULL) return NULL; iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; } Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z': 'spam'.startswith('sz') I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com

Show replies by date

Guido van Rossum

January 2014

6:24 p.m.

New subject: str.startswith taking any iterator instead of just tuple

The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?) On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james@dontusethiscode.com> wrote:

...

Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g.,

'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int))

In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration.

As a result, the following are considered invalid:

'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz')

Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple

There are two common workarounds:

'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'})

Of course, the following construction already has a clear, separate meaning:

'spam'.startswith('sz') # 'spam' starts with 'sz'

In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch?

The code would look something like:

it = PyObject_GetIter(subobj); if (it == NULL) return NULL;

iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; }

Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z':

'spam'.startswith('sz')

I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith

Cheers, James Powell

follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

Amber Yust

6:33 p.m.

New subject: str.startswith taking any iterator instead of just tuple

I could see expanding to allow lists/sets as well as tuples being useful, e.g. for using dynamically generated prefix lists without creating additional tuple objects, but I don't see arbitrary iteration being necessary. On Thu Jan 02 2014 at 3:25:20 PM, Guido van Rossum <guido@python.org> wrote:

...

The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?)

On Thu, Jan 2, 2014 at 10:29 AM, James Powell <james@dontusethiscode.com> wrote:

...
Some functions and methods allow the provision of a tuple of arguments which will be looped over internally. e.g.,

'spam'.startswith(('s', 'z')) # 'spam' starts with 's' or with 'z' isinstance(42, (float, int))

In these cases, CPython uses PyTuple_Check and PyTuple_GET_ITEM to perform this internal iteration.

As a result, the following are considered invalid:

'spam'.startswith(['s', 'z']) 'spam'.startswith({'s', 'z'}) 'spam'.startswith(x for x in 'sz')

Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: startswith first arg must be str, unicode, or tuple

There are two common workarounds:

'spam'.startswith(tuple({'s', 'z'})) any('spam'.startwith(c) for c in {'s', 'z'})

Of course, the following construction already has a clear, separate meaning:

'spam'.startswith('sz') # 'spam' starts with 'sz'

In these cases, could we supplant the PyTuple_Check with one that would allow any iterator? Alternatively, could add this as an additional branch?

The code would look something like:

it = PyObject_GetIter(subobj); if (it == NULL) return NULL;

iternext = *Py_TYPE(it)->tp_iternext; for(;;) { substring = iternext(it); if (substring == NULL) Py_RETURN_FALSE; result = tailmatch(self, substring, start, end, -1); Py_DECREF(substring); if (result) Py_RETURN_TRUE; }

Of course, in the case of methods like .startswith, this would need to ensure the following behaviour remains unchanged. The following should always check if 'spam' starts with 'sz' not starts with 's' or with 'z':

'spam'.startswith('sz')

I searched bugs.python.org and python-ideas for any previous discussion of this topic. If this seems reasonable, I can submit an enhancement to bugs.python.org with a patch for unicodeobject.c:unicode_startswith

Cheers, James Powell

follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

James Powell

6:37 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On 01/02/2014 06:24 PM, Guido van Rossum wrote:

...

The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?)

This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating: any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes)) However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity: isinstance(x, {int, float}) I do agree that it's definitely important to retain the behaviour of: 'spam'.startswith('sz') At same time, I think the non-string iterable problem is already fairly well-known and not a source of great confusion. How often has one typed: isinstance(x, Iterable) and not isinstance(x, str) Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com

Guido van Rossum

6:59 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On Thu, Jan 2, 2014 at 1:37 PM, James Powell <james@dontusethiscode.com> wrote:

...

On 01/02/2014 06:24 PM, Guido van Rossum wrote:

...
The current behavior is intentional, and the ambiguity of strings themselves being iterables is the main reason. Since startswith() is almost always called with a literal or tuple of literals anyway, I see little need to extend the semantics. (I notice that you don't actually give any examples where the iterator would be useful -- have you encountered any, or are you just arguing for consistency's sake?)

This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating:

any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes))

Neither of these strikes me as bad. Also, depending on whether the set of prefixes itself changes dynamically, it may be best to lift the tuple() call out of the startswith() call. Note that for performance, I suspect that the any() version will be slower if you can avoid calling tuple() every time -- I recall once finding that x.startswith('ab') benchmarked slower than x[:2] == 'ab' because the name lookup for 'startswith' dominated the overall time.

...

However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity:

isinstance(x, {int, float})

But this is even less likely to have a dynamically generated argument. And there could still be another ambiguity here: a metaclass could conceivably make its instances (i.e. classes) iterable.

...

I do agree that it's definitely important to retain the behaviour of:

'spam'.startswith('sz')

Duh. :-)

...

At same time, I think the non-string iterable problem is already fairly well-known and not a source of great confusion. How often has one typed:

isinstance(x, Iterable) and not isinstance(x, str)

If you find yourself typing that a lot I think you have a bigger problem though. All in all I hope you will give up your push for this feature. It just doesn't seem all that important, and you really just move the inconsistency to a different place (special-casing strings instead of tuples). -- --Guido van Rossum (python.org/~guido)

Alexander Heger

7:19 p.m.

New subject: str.startswith taking any iterator instead of just tuple

...

...
isinstance(x, Iterable) and not isinstance(x, str)

If you find yourself typing that a lot I think you have a bigger problem though.

How do you replace this?

Guido van Rossum

7:49 p.m.

New subject: str.startswith taking any iterator instead of just tuple

By designing an API that doesn't require such overloading. On Thursday, January 2, 2014, Alexander Heger wrote:

...

...
...
isinstance(x, Iterable) and not isinstance(x, str)

If you find yourself typing that a lot I think you have a bigger problem though.

How do you replace this? _______________________________________________ Python-ideas mailing list Python-ideas@python.org <javascript:;> https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (on iPad)

David Townshend

6:09 a.m.

New subject: str.startswith taking any iterator instead of just tuple

Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones. Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex. Also, the only times I can recall using a string as a sequence is in doctests (because it reads better than a list of characters) or in the interpreter when I'm trying something out. I'm not suggesting changing it - there's too much history for that, but I am interested to know if there is some fundamental reason that strings are sequences. If a new string object was being implemented now, would it be a sequence? On 3 Jan 2014 02:49, "Guido van Rossum" <guido@python.org> wrote:

...

By designing an API that doesn't require such overloading.

On Thursday, January 2, 2014, Alexander Heger wrote:

...
...
...
isinstance(x, Iterable) and not isinstance(x, str)

If you find yourself typing that a lot I think you have a bigger problem though.

How do you replace this? _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (on iPad)

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Andrew Barnert

12:48 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:

...

Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones.

You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.

...

Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex

People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.

Alexander Heger

2:02 p.m.

New subject: str.startswith taking any iterator instead of just tuple

...

People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.

whereas the issue seems now settled, you could use explicit functions like str.iter(), str.codepoints(), str.substr(), ...

Andrew Barnert

7:38 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On Jan 5, 2014, at 11:02, Alexander Heger <python@2sn.net> wrote:

...

...
People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s". And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless you wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.

whereas the issue seems now settled, you could use explicit functions like str.iter(), str.codepoints(), str.substr(), ...

Sure, and we could add list.iter(), list.slice(), etc. and get rid of iterables, indexing and slicing, entirely. If we add separate map and similar methods to every iterable type, we can even get rid of iterators. If it's good enough for ObjC, why should Python try to be more readable or concise?

Chris Angelico

7:27 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On Mon, Jan 6, 2014 at 4:48 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.)

You could simply "for ch in s.split('')". A number of languages define that to mean fracturing a string into one-character strings. Python currently raises ValueError, so it won't break existing code. But yes, it's easier to be able to iterate over a string. ChrisA

Terry Reedy

12:08 a.m.

New subject: str.startswith taking any iterator instead of just tuple

On 1/5/2014 12:48 PM, Andrew Barnert wrote:

...

On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:

...
Reading this thread made me start to think about why a string is a sequence,

Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example: class Symbol: def __init__(self, name): self._name = name # optionally check that name is string def __eq__(self, other): return self._name == other._name def __hash__(self): return hash(self._name) def __repr__(self): return 'Symbol({r:})'.format(self._name) __str__ = __repr__ # or define to tast Now Symbols are hashable, equality-comparable, but not iterable. In other words, I believe the desire for a non-iterable 'string' is a desire for something that is not really a string, but is perhaps being represented as a string merely for convenience. Using duples as linked-list nodes (which I have done), because one does not bother to define a node class is similar. Tuple iteration is equally meaningless in this context as string iteration is in symbol context.

...

You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.

And as already explained, indexable means iterable.

...

...
Every use case I can think of for iterating over a string either involves first splitting the string, or would be better done with a regex

Splitting involves forward iteration. Regex matching adds backtracking on top of forward iteration. Please tell me a *string* algorithm that does *not* involve character iteration somewhere.

...

People have mentioned use cases for iterating strings in this thread. And it's easy to think of more. There are all kinds of algorithms that treat strings as sequences of characters. Sure, many of these functions are already methods on str or otherwise built into the stdlib, but that just means they're implemented by iterating the string storage in C with a loop around "*++s".

I was going to make the same point. Strings have the following methods: 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill'. Written in Python (as in classes and PyPy!), nearly all start with 'for c in s:' (or 'in reversed(s)'). The ones that do not generally use len(s). Len(s) is calculated in str.__new__ with an internal iteration: 'for char added to string, increment len counter'. Comparing strings also involves interation, hence sorting lists of strings by comparison

...

And if you want to extend that set of builtins with similar functions, how else would you do it but with a "for ch in s" loop? (Well, you could "for ch in list(s)", but that's still treating strings as iterables.) For example, many people are asked to write a rot13 function in one of their first classes. How would you write that if strings weren't iterables? There's no way a regex is going to help you here, unless yo u wanted to do something like using re.sub('.') as a convoluted and slow way of writing map.

AFAIK, all the codecs iterate character by character. -- Terry Jan Reedy

Devin Jeanpierre

3:09 a.m.

New subject: str.startswith taking any iterator instead of just tuple

On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

On 1/5/2014 12:48 PM, Andrew Barnert wrote:

...
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:

...
Reading this thread made me start to think about why a string is a sequence,

Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example:

And sequences in math / CS are functions from the natural numbers to elements of the sequence. Since isinstance(str, types.FunctionType) isn't True, it must mean that Python strings aren't strings. But seriously, Python functions aren't functions, the set of Python complex numbers is not the set of complex numbers, Python types aren't types, and Python addition is not addition; mathematical terminology in programming is evocative and not actually literally true. Arguments based on trying to literally copy math to the letter are flawed, probably irretrievably so. The important feature of strings in math is not that they are literally a sequence of characters, but that they correspond to a sequence of characters isomorphically. You can represent them any way you like, as long as you maintain that isomorphism, and the operations with the right names do the right thing, etc. As evidence, observe that not every programming language has its string type obey the equivalent of Python's sequence interface or math's notion of "sequence" per se (mapping naturals to elements). For example, Haskell strings are linked lists; Rust strings are arrays behind the scenes but don't expose it within the str type; etc. It's not just strings, either, There are a multitude of ways of defining the natural numbers -- maybe a natural number is a set of a given structure (and which structure?), maybe it is a pair of integers where the second integer is 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a bitstring of arbitrary finite length. The usual construction in math is the first, but Python uses the last one. To say Python doesn't actually have natural numbers but does have strings, is absurd, but it is what your logic points towards. If two things are equivalent, everything said about one can be said about the other, and math is about saying things about stuff, not about precise definitions of structure -- those are chosen for convenience. -- Devin

Devin Jeanpierre

3:19 a.m.

New subject: str.startswith taking any iterator instead of just tuple

On Mon, Jan 6, 2014 at 12:09 AM, Devin Jeanpierre <jeanpierreda@gmail.com> wrote:

...

On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...
On 1/5/2014 12:48 PM, Andrew Barnert wrote:

...
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:

...
Reading this thread made me start to think about why a string is a sequence,

Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example:

And sequences in math / CS are functions from the natural numbers to elements of the sequence. Since isinstance(str, types.FunctionType) isn't True, it must mean that Python strings aren't strings.

But seriously, Python functions aren't functions, the set of Python complex numbers is not the set of complex numbers, Python types aren't types, and Python addition is not addition; mathematical terminology in programming is evocative and not actually literally true. Arguments based on trying to literally copy math to the letter are flawed, probably irretrievably so.

The important feature of strings in math is not that they are literally a sequence of characters, but that they correspond to a sequence of characters isomorphically. You can represent them any way you like, as long as you maintain that isomorphism, and the operations with the right names do the right thing, etc. As evidence, observe that not every programming language has its string type obey the equivalent of Python's sequence interface or math's notion of "sequence" per se (mapping naturals to elements). For example, Haskell strings are linked lists; Rust strings are arrays behind the scenes but don't expose it within the str type; etc.

It's not just strings, either, There are a multitude of ways of defining the natural numbers -- maybe a natural number is a set of a given structure (and which structure?), maybe it is a pair of integers where the second integer is 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a bitstring of arbitrary finite length. The usual construction in math is the first, but Python uses the last one. To say Python doesn't actually have natural numbers but does have strings, is absurd, but it is what your logic points towards. If two things are equivalent, everything said about one can be said about the other, and math is about saying things about stuff, not about precise definitions of structure -- those are chosen for convenience.

Apologies, I wasn't thinking much and bungled that last argument (should've talked about integers instead of naturals; and even did, for half of it...). Fixed: [...] There are a multitude of ways of defining the integers -- maybe an integer is an equivalence class over the pairs of naturals, maybe it is rational number with denominator 1, maybe it is an infinite sequence of rationals whose limit is a rational with denominator 1, maybe it is a two's complement bitstring of arbitrary length. The usual construction in math is the first (or the second to last), but Python uses the last one. To say that Python doesn't actually have integers, but does have strings, is absurd, but [...] -- Devin

Terry Reedy

6:39 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On 1/6/2014 3:09 AM, Devin Jeanpierre wrote:

...

On Sun, Jan 5, 2014 at 9:08 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

...
...
On Jan 5, 2014, at 3:09, David Townshend <aquavitae69@gmail.com> wrote:

...
Reading this thread made me start to think about why a string is a sequence,

Because a string is defined in math/language theory as a sequence of symbols from an alphabet. If you want to invent or define something else, such as an atomic symbol type, please use a different term. For example:

And sequences in math / CS are functions from the natural numbers to elements of the sequence.

And functions (mappings) in math are defined either by a rule for calculating the output from the input or by a table (set of pairs) giving the output for each input. If the input domain is the finite sequence of counts from 0 to k, the table can be condensed to a sequence of k+1 output values.

...

Since isinstance(str, types.FunctionType) isn't True,

Python has multiple builtin callable types, and users can define more, so you need to expand that test. Anyway, since a string is not a function defined by rule, it must be a function defined by a table. Since the input domain is a finite sequence of counts, we can and do condense the table to a sequence of output values. Which is an expansion of what I said.

...

[snip] -- Terry Jan Reedy

Devin Jeanpierre

7:20 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On Mon, Jan 6, 2014 at 3:39 PM, Terry Reedy <tjreedy@udel.edu> wrote:

...

...
Since isinstance(str, types.FunctionType) isn't True,

Python has multiple builtin callable types, and users can define more, so you need to expand that test. Anyway, since a string is not a function defined by rule, it must be a function defined by a table. Since the input domain is a finite sequence of counts, we can and do condense the table to a sequence of output values. Which is an expansion of what I said.

No, I don't need to expand the test -- the limitation of the test was the entire point. I was making fun of your argument that because the mathematical terms are the same, therefore they must be the same in Python. "strings are sequences in math, therefore they are in python" is a superficial and fundamentally wrong argument. Here's another argument of that form: "the nth element of a string is not a string in math, therefore the nth element of a string is not a string in Python". That's a lie, of course. There are too many ways that type of argument falls flat. -- Devin

Bruce Leban

2:06 a.m.

New subject: str.startswith taking any iterator instead of just tuple

On Sun, Jan 5, 2014 at 9:48 AM, Andrew Barnert <abarnert@yahoo.com> wrote:

...

...
Reading this thread made me start to think about why a string is a sequence, and I can't actually see any obvious reason, other than historical ones.

You've seriously never indexed or sliced a string? Those are the two core operations in sequences, and they're obviously useful on strings.

I am doing most coding in two languages right now: Python and Javascript. I have never wished that Python had string.charAt(i) but I have often wished that Javascript had string[i]. When I've iterated over the characters in a string in Javascript, it has never occurred to me to write it using str.split(''). By irrelevant analogy, I have never used complex numbers in Python or Javascript and I can't see any obvious reason to support them. It just confuses people who inadvertently write cmath.sqrt instead of math.sqrt. For the few people that use complex numbers, they would be better served by a tuple of real and imaginary parts. As someone who doesn't use them, my opinion is clearly more important that that of those that use them. --- Bruce Learn how hackers think: http://j.mp/gruyere-security (Not serious about removing complex numbers from Python. If you didn't see the sarcasm, sorry.)

Steven D'Aprano

5:57 a.m.

New subject: str.startswith taking any iterator instead of just tuple

On Sun, Jan 05, 2014 at 11:06:10PM -0800, Bruce Leban wrote:

...

As someone who doesn't use them [complex numbers], my opinion is clearly more important that that of those that use them.

:-) +1 QOTW -- Steven

Eric Snow

12:49 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On Jan 5, 2014 4:10 AM, "David Townshend" <aquavitae69@gmail.com> wrote:

...

Reading this thread made me start to think about why a string is a

sequence, and I can't actually see any obvious reason, other than historical ones. Sometimes I think it would be more clear if strings weren't sequences but had various attributes that exposed sequence "views", e.g. codepoints, etc. Making strings non-sequences isn't realistic at this point, but adding the sequence view attributes may still be nice. That said, at present it's not something I personally have any use case for. There was an article floating around the web recently where the deficiencies of unicode implementations was discussed and I recall something there or in related discussions about use cases for having different views into a string. Wow that was vague. :) The different views into unicode strings certainly comes up from time to time on our lists. -eric

spir

7:34 a.m.

New subject: str.startswith taking any iterator instead of just tuple

On 01/05/2014 06:49 PM, Eric Snow wrote:

...

On Jan 5, 2014 4:10 AM, "David Townshend" <aquavitae69@gmail.com> wrote:

...
Reading this thread made me start to think about why a string is a

sequence, and I can't actually see any obvious reason, other than historical ones.

Sometimes I think it would be more clear if strings weren't sequences but had various attributes that exposed sequence "views", e.g. codepoints, etc. Making strings non-sequences isn't realistic at this point, but adding the sequence view attributes may still be nice.

That said, at present it's not something I personally have any use case for. There was an article floating around the web recently where the deficiencies of unicode implementations was discussed and I recall something there or in related discussions about use cases for having different views into a string. Wow that was vague. :) The different views into unicode strings certainly comes up from time to time on our lists.

This does not fit the picture as long as strings are indexable and sliceable, in my opinion. But most importantly, from the user practice & experience perspective, and while from a theoretical one it may be debattable, I consider it a great feature of python that everyday "mondane" string processing can be done using simple and easy Python string routines (i include here indexing & slicing). Alternatives would be regexes (read: Perl) and/or matching/parsing/searching libs (eg pyparsing) everywhere in python code; both are difficult, error-prone, hard to debug. The former are plain esoteric (but terribly practicle ;-), and I'm happy to rarely have to decipher *others'* regexes when reading python code (my own are far easier, indeed ;-). Denis

James Powell

7:39 p.m.

New subject: str.startswith taking any iterator instead of just tuple

On 01/02/2014 06:59 PM, Guido van Rossum wrote:

...

...
This is driven by a real-world example wherein a large number of prefixes stored in a set, necessitating: any('spam'.startswith(c) for c in prefixes) # or 'spam'.startswith(tuple(prefixes)) Neither of these strikes me as bad. Also, depending on whether the set of prefixes itself changes dynamically, it may be best to lift the tuple() call out of the startswith() call.

I agree. The any() formulation proves good enough in practice. Creating a tuple can be a bit tricky, since the list of prefixes could be large and could change.

...

...
However, .startswith doesn't seem to be the only example of this, and the other examples are free of the string/iterable ambiguity: isinstance(x, {int, float}) And there could still be another ambiguity here: a metaclass could conceivably make its instances (i.e. classes) iterable.

It's an interesting point that there's fundamental ambiguity between providing an iterable of arguments or providing a single argument that is itself an iterable (e.g., in the case of a type that is itself iterable, like Enum) In fact, I've actually warmed up to the any() formulation, because it makes explicit which behaviour you want.

...

...
I do agree that it's definitely important to retain the behaviour of: 'spam'.startswith('sz') Duh. :-)

You never know...

...

All in all I hope you will give up your push for this feature. It just doesn't seem all that important, and you really just move the inconsistency to a different place (special-casing strings instead of tuples).

For these functions and methods, being able to provide a tuple of arguments instead of a single argument seems mostly a convenience. It allows the most common case of wanting to internalise the iteration with a minimum of ambiguity. The any() or tuple() formulation are available where needed. In the end, I'm happy to drop the push for this feature. (In general, I agree that there isn't a need to stamp out all inconsistencies or to belabour the use of abstract types.) Cheers, James Powell follow: @dontusethiscode + @nycpython attend: nycpython.org + flask-nyc.org read: seriously.dontusethiscode.com

4061

Age (days ago)

4066

Last active (days ago)

List overview

Download

21 comments

13 participants

participants (13)

Alexander Heger
Amber Yust
Andrew Barnert
Bruce Leban
Chris Angelico
David Townshend
Devin Jeanpierre
Eric Snow
Guido van Rossum
James Powell
spir
Steven D'Aprano
Terry Reedy

str.startswith taking any iterator instead of just tuple

David Townshend

spir

tags

participants (13)