[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

2 Mar 2020


      On Tue, Mar 3, 2020 at 10:13 AM Steven D'Aprano <steve@pearwood.info> wrote:
...
On Sun, Feb 23, 2020 at 01:46:53PM -0500, Richard Damon wrote:
...
I would agree with this. In my mind, fundamentally a 'string' is a
sequence of characters, not strings,
If people are going to seriously propose this Character type, I think
they need to be more concrete about the proposal and not just hand-wave
it as "strings are sequences of characters".
Presumably you would want `mystring[0]` to return a char, not a str, but
there are plenty of other unspecified details.
- Should `mystring[0:1]`return a char or a length 1 str?
I'm not seriously proposing it, and I am in fact against the proposal
quite strongly, but ISTM the only sane way to do things is to mirror
the Py3 bytes object. Just as mybytes[0] returns an int, not a bytes,
this should return a char. And that can then be the pattern for
anything else that's similar.
...
- Presumably "Z" remains a length-1 str for backward compatibility,
  so how do you create a char directly?
There would probably need to be an alternative literal form. In C, "Z"
is a string, and 'Z' is a char; in Python, a more logical way to do it
would probably be a prefix like c"Z" - or perhaps just "Z"[0] and have
done with it.
...
- Does `chr(n)` continue to return a str?
Logically it should return a char, and in fact would probably want to
be the type, just as str/int/float etc are.
...
- Is the char type a subclass of str?
That way lies madness. I suggest not.
...
- Do we support mixed concatenation between str and char?
For the sake of backward compatibility, probably yes. But that's a
weak opinion and could easily be swayed.
...
- If so, does concatenating the empty string to a char give a char
  or a length-1 string?
A length 1 string (or, per above, TypeError).
...
- Are chars indexable?
- Do they support len()?
No. A character is a single entity, just as an integer is. (NOTE: This
discussion has been talking about "characters", but I think logically
they have to be single Unicode codepoints. Thus the "length" of a
character is not a meaningful quantity.)
...
If char is not a subclass of string, that's going to break code that
expects that `all(isinstance(c, str) for c in obj)` to be true when
`obj` happens to be a string.
Backward compatibility WOULD be broken by this proposal (which is part
of why I'm so against it). This is one of those eggs that has to be
broken to make this omelette.
...
If char is a subclass, that means we can no longer deny that strings are
sequences of strings, since chars are strings. It also means that it
will break code that expects strings to be iterable,
And that's why I say this way lies madness.
...
I don't have a good intuition for how much code will break or simply
stop working correctly if we changed string iteration to yield a new
char type instead of length-1 strings.
Nor do I have a good intuition for whether this will *actually* help
much code. It seems to me that there's a good chance that this could end
up simply shifting isinstance tests for str in some contexts to
isinstance tests for char in different contexts.
Agreed.

ChrisA

[Python-ideas] Re: Incremental step on road to improving situation around iterable strings

Chris Angelico