[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

23 Feb 2020

      Steven D'Aprano wrote:
...
On Sun, Feb 23, 2020 at 08:51:55PM -0000, Steve Jorgensen wrote:
Python has not just the "iterator protocol" using __iter__ and 
__next__, but also has an older sequence protocol used in Python 1.x 
which still exists to this day. This sequence protocol falls back on 
repeated indexing.
Conceptually, we should be able to reason that every object that 
supports indexing should be iterable, without adding a special case 
exception "...except for str".
Strings already have an exception in this area. Usually `x in y` means `any(x == elem for elem in y)`. It makes the two meanings of `in` match, and to me (I don't know if this is true) it's the reason that iterating over dictionaries yields the keys, although personally I'd find it more convenient if it yielded the items. But `in` means something else for strings. It's not as strong a rule as the link between iteration and indexing, but it is a break in tradition.

Another somewhat related example: we usually accept that basically every object can be treated as a boolean, even more so of it has a `__len__`. But numpy and pandas break this 'rule' by raising an exception if you try to treat an array as a boolean, e.g:

ValueError: The truth value of an array with more than one element is ambiguous. 
Use a.any() or a.all()

In a sense these libraries decided that while unambiguous behaviour could be defined, the intention of the user would always be ambiguous. The same could be said for strings. Needing to iterate over a string is simply not common unless you're writing something like a parser. So even though the behaviour is well defined and documented, when someone tries to iterate over a string, statistically we can say there's a good chance that's not what they actually want. And in the face of ambiguity, refuse the temptation to guess.

I do think it would be a pity if strings broke the tradition of indexable implies iterable, but "A Foolish Consistency is the Hobgoblin of Little Minds". The benefits in helping users when debugging would outweigh the inconsistency and the minor inconvenience of adding a few characters. Users who are expecting iteration to work because indexing works will quickly get a helpful error message and fix their problem. At the risk of overusing classic Python sayings, Explicit is better than implicit.

However, we could get the benefit of making debugging easier without having to actually *break* any existing code if we just raised a warning whenever someone iterates over a string. It doesn't have to be a deprecation warning and we don't need to ever actually make strings non-iterable.

I'm out of time, so I'll just quickly say that I prefer `.chars` as a property without the `()`. And jdveiga you asked what would be the advantage of all this after I made my previous post about it biting beginners, I'm not sure if you missed that or you were just typing yours when I made mine.

[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Alex Hall