[Python-ideas] discontinue iterable strings

Chris Angelico rosuav at gmail.com
Sun Aug 21 00:10:41 EDT 2016


On Sun, Aug 21, 2016 at 12:52 PM, Steven D'Aprano <steve at pearwood.info> wrote:
>> > The fixes overall will be a lot easier and obvious than introduction of
>> > unicode as default string type in Python 3.0.
>>
>> That's a bold claim. Have you considered what's at stake if that's not true?
>
> Saying that these so-called "fixes" (we haven't established yet that
> Python's string behaviour is a bug that need fixing) will be easier and
> more obvious than the change to Unicode is not that bold a claim. Pretty
> much everything is easier and more obvious than changing to Unicode. :-)
> (Possibly not bringing peace to the Middle East.)

And yet it's so simple. We can teach novice programmers about two's
complement [1] representations of integers, and they have no trouble
comprehending that the abstract concept of "integer" is different from
the concrete representation in memory. We can teach intermediate
programmers how hash tables work, and how to improve their performance
on CPUs with 64-byte cache lines - again, there's no comprehension
barrier between "mapping from key to value" and "puddle of bytes in
memory that represent that mapping". But so many programmers are
entrenched in the thinking that a byte IS a character.

> I think that while the suggestion does bring some benefit, the benefit
> isn't enough to make up for the code churn and disruption it would
> cause. But I encourage the OP to go through the standard library, pick a
> couple of modules, and re-write them to see how they would look using
> this proposal.

Python still has a rule that you can iterate over anything that has
__getitem__, and it'll be called with 0, 1, 2, 3... until it raises
IndexError. So you have two options: Remove that rule, and require
that all iterable objects actually define __iter__; or make strings
non-subscriptable, which means you need to do something like
"asdf".char_at(0) instead of "asdf"[0]. IMO the second option is a
total non-flyer - good luck convincing anyone that THAT is an
improvement. The first one is possible, but dramatically broadens the
backward-compatibility issue. You'd have to search for any class that
defines __getitem__ and not __iter__.

If that *does* get considered, it wouldn't be too hard to have a
compatibility function, maybe in itertools.

def subscript(self):
    i = 0
    try:
        while "moar indexing":
            yield self[i]
            i += 1
    except IndexError:
        pass

class Demo:
    def __getitem__(self, item):
        ...
    __iter__ = itertools.subscript

But there'd have to be the full search of "what will this break", even
before getting as far as making strings non-iterable.

ChrisA

[1] Not "two's compliment", although I'm told that Two can say some
very nice things.


More information about the Python-ideas mailing list