[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

3 Mar 2020

      On Mar 3, 2020, at 01:09, M.-A. Lemburg <mal@egenix.com> wrote:
...
The main reason for having not having characters and strings is
reducing complexity. Why try to add this now for no apparent
net benefit ?
I don’t think the benefit is worth the (as far as I can tell insurmountable) backward compatibility cost, but you can’t argue that there is no benefit.

An object whose first element is itself is a valid idea, but it’s a pathological case; you have to write something like `lst=[]; lst.append(lst)` to get one. So code like this is fine:

    def flatten(xs):
        for x in xs:
            if isinstance(x, Iterable):
                yield from flatten(x)
            else:
                yield x

… in that it only infinitely recurses if you go out of your way to give it an infinitely recursive value.

… except that every string is an infinitely recursive value, so all you have to do is give it 'A'.

Which is not just weird in theory; it breaks perfectly sensible code like flatten. And it’s why we have to have idioms like endswith taking a str|Tuple[str] rather than any Iterable: forcing people to write s.endswith(tuple(suffixes)) when suffixes is a set Is the only reasonable way to avoid confusion when suffixes is an arbitrary iterable.

And, because it comes up all the time, and many other languages don’t have this problem, it has to be explained to new students and people coming from other languages, and painfully remembered or relearned by people who usually work in Java or whatever but occasionally have to do Python.

Of course regular Python developers have this drummed into their heads, and usually remember to check for str and handle it specially, and we’ve all learned to deal with the tuple-special idiom, and so on. But that doesn’t mean it’s an ideal design, just that we’ve all gotten used to it.
...
I think the situation with bytes (iteration returning integers
instead of bytes) has shown that this not a very user friendly
nor intuitive approach:
Well, it shows that using integers is confusing. 

In fact, it’s even worse than C, where char is an integral type but at least not the same type as int. (A char ranges from 0 to 255; its default output and input in functions like printf, and C++ streams, is as a character rather than as a number; there are a bunch of character-related functions that take char but not int, although using them with an int is usually just a warning rather than an error; etc.)

That doesn’t mean a new type would be confusing:
...
...
...
...
b = bytes((1,2,3,4))
b
b'\x01\x02\x03\x04'
b[:2]
b'\x01\x02'
b[:1]
b'\x01'
b[0]
byte(b'\x01')
In fact, it would make bytes consistent with other sequences of byte:
...
...
...
s = list(b)
s[:1]
[byte(b'\x01')]
s[0]
byte(b'\x01')
… without adding any new inconsistencies:
...
...
...
assert tuple(b[:2]) == tuple(s[:2])
assert b[0] == s[0]
The downside, of course, is having one more builtin type. But that’s not an instant disqualifier; it’s a cost to trade off with the benefits. I think if it weren’t for backward compatibility, chr might turn out to be useful enough to qualify (byte I’m much less confident of—it comes up less often, and also once you start bikeshedding the interface there’s a lot more vagueness in the concept), or at least worth having a PEP to explain why it’s rejected. (But of course “if not for backward compatibility” isn’t realistic.)

[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Andrew Barnert