On Tue, Mar 3, 2020 at 10:13 AM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Feb 23, 2020 at 01:46:53PM -0500, Richard Damon wrote:
I would agree with this. In my mind, fundamentally a 'string' is a sequence of characters, not strings,
If people are going to seriously propose this Character type, I think they need to be more concrete about the proposal and not just hand-wave it as "strings are sequences of characters".
Presumably you would want `mystring[0]` to return a char, not a str, but there are plenty of other unspecified details.
- Should `mystring[0:1]`return a char or a length 1 str?
I'm not seriously proposing it, and I am in fact against the proposal quite strongly, but ISTM the only sane way to do things is to mirror the Py3 bytes object. Just as mybytes[0] returns an int, not a bytes, this should return a char. And that can then be the pattern for anything else that's similar.
- Presumably "Z" remains a length-1 str for backward compatibility, so how do you create a char directly?
There would probably need to be an alternative literal form. In C, "Z" is a string, and 'Z' is a char; in Python, a more logical way to do it would probably be a prefix like c"Z" - or perhaps just "Z"[0] and have done with it.
- Does `chr(n)` continue to return a str?
Logically it should return a char, and in fact would probably want to be the type, just as str/int/float etc are.
- Is the char type a subclass of str?
That way lies madness. I suggest not.
- Do we support mixed concatenation between str and char?
For the sake of backward compatibility, probably yes. But that's a weak opinion and could easily be swayed.
- If so, does concatenating the empty string to a char give a char or a length-1 string?
A length 1 string (or, per above, TypeError).
- Are chars indexable?
- Do they support len()?
No. A character is a single entity, just as an integer is. (NOTE: This discussion has been talking about "characters", but I think logically they have to be single Unicode codepoints. Thus the "length" of a character is not a meaningful quantity.)
If char is not a subclass of string, that's going to break code that expects that `all(isinstance(c, str) for c in obj)` to be true when `obj` happens to be a string.
Backward compatibility WOULD be broken by this proposal (which is part of why I'm so against it). This is one of those eggs that has to be broken to make this omelette.
If char is a subclass, that means we can no longer deny that strings are sequences of strings, since chars are strings. It also means that it will break code that expects strings to be iterable,
And that's why I say this way lies madness.
I don't have a good intuition for how much code will break or simply stop working correctly if we changed string iteration to yield a new char type instead of length-1 strings.
Nor do I have a good intuition for whether this will *actually* help much code. It seems to me that there's a good chance that this could end up simply shifting isinstance tests for str in some contexts to isinstance tests for char in different contexts.
Agreed. ChrisA