
Whelp, this one bit me again today, so I finally decided to come forward with a concrete proposal: how to handle the fact that strings are themselves sequences of length-one strings. I know that this has been discussed at some length, but most of the other proposals I've found are a bit too complicated for the relative simplicity of the problem. My proposal is to add a public `Chr` type to the `typing` module, defined as `Chr = NewType("Chr", str)`. That's it. typeshed will be modified to type `str` as a `Sequence[Chr]`, rather than a `Sequence[str]`. Other parts of typeshed (`builtins.ord`, `builtins.chr`, `os.sep`, and `os.pathsep` are a couple that come to mind) can be updated as well with either the new `Chr`, or something like `TypeVar("Str", Chr, str)`. The beauty of this is proposal is that type checkers will be able to properly check the `str`/`Chr` relationship without any additional special handling (beyond that already required for `str` types). While they have the option of being stricter (for example, issuing warnings when a `Chr` is used as a `str`), the type system does not require it, because it mirrors the reality that at runtime, a `Chr` is just a special-case of `str`. I know it's not perfect, but it certainly seems better than the status quo. I just thought that I would start here to gather feedback, and see if anybody is interested in co-sponsoring a more formal proposal. Brandt

Seems worth trying to put together a set of patches. I expect that's the best way to find whether there are flaws in the proposal and to explore how to address them. One concern would seem that under the definition of `Chr = NewType("Chr", str)`, a string literal of length 1 does not have type `Chr`, so you'd have to modify all type checkers to make string literals of length 1 instances of `Chr` rather than `str`. Another concern is that there's no way to introduce `Chr` at runtime in Python 3.8 and earlier, because typing.py is in the stdlib. We typically do this by adding the new feature to typing_extensions.py. You'd also probably have to arrange things so that `.lower()` on a `Chr` instance returns a `Chr` rather than a `str` (and so on for many string methods). And I guess you want `str.__getitem__(int)` (but not `str.__getitem__(slice)` to return a `Chr` instance. Finally, in my experience (having given this some thought in the past) the "relative simplicity of the problem" is a fallacy -- the problem is anything *but* simple. Alas. On Thu, Nov 7, 2019 at 9:04 PM Brandt Bucher <brandtbucher@gmail.com> wrote:
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>

Am 08.11.19 um 06:47 schrieb Guido van Rossum:
Wouldn't that mean that now the following would fail?
def foo(x: str): pass foo("a")
As well as:
Personally I would prefer if Chr was a subtype of str, so that Chr can be used where str is expected, but not vice-versa. This would not need any "Union[str, Chr]" annotations and would be mostly compatible with current behavior. Main problem is that the second example above does not type-check, which is a blocker in my opinion. - Sebastian

Thanks for the helpful thoughts. A few replies: Guido van Rossum wrote:
One concern would seem that under the definition of `Chr = NewType("Chr", str)`, a string literal of length 1 does not have type `Chr`, so you'd have to modify all type checkers to make string literals of length 1 instances of `Chr` rather than `str`.
While this might be a good feature for type checkers, I don't consider it a necessary requirement of the proposal. I *think* it would be sufficient for `Chr`s to be produced only by string iteration/indexing (or possibly `builtins.chr` / `os.sep` / etc...), with an explicit cast to `Chr(c)` when needed for other strings (for example, when passing to `builtins.ord` or similar). This also avoids the `if ...: x = "a" ` / `else: x = "ab"` hiccup Sebastian brings up.
You'd also probably have to arrange things so that `.lower()` on a `Chr` instance returns a `Chr` rather than a `str` (and so on for many string methods).
Agreed, I was envisioning a `TypeVar("Str", "str", Chr)` to annotate the `self` argument and return value for these on typeshed's `str`.
Finally, in my experience (having given this some thought in the past) the "relative simplicity of the problem" is a fallacy -- the problem is anything but simple.
You're right, and I wasn't trying to make light of prior efforts in this area. I should have said something like "We probably don't need this much specialized machinery to get *most* of the benefit here." Sebastian Rittau wrote:
Personally I would prefer if Chr was a subtype of str, so that Chr can be used where str is expected, but not vice-versa. This would not need any "Union[str, Chr]" annotations and would be mostly compatible with current behavior.
Just to be clear, this is indeed what I am proposing with `Chr = NewType("Chr", str)`. `Chr` is a subtype of `str` with no additional functionality.

I believe this is similar to a proposal I made here[0], and I fully support it. I'll see if I can make the changes public, but fwiw I was able to catch actual, real, bugs of this form with only some basic changes to pytype's stdlib and builtin type hints[1]. That said, there are a few obstacles that are worth considering: 1. This is technically backwards incompatible. It will cause typechecking errors for currently working code, and probably a good bit of them. I'm not sure if there are norms for dealing with that with changes to mypy/typeshed, but I wanted to raise the question. (for example, could we hide this behavior behind a flag for a while, is that something we want to do?) 2. There's a strong opinion question of what methods the char type should support (at typecheck time). I'm of the opinion that it shouldn't support slicing, any of the find, partition, or split methods, nor iter/len/a few others. I presume others will disagree about some of these. 3. 2/3 is a consideration here. Personally, I've found that type checking is a super useful tool to help converting code to python3 and to avoid and fix str/bytes issues. However, in python2 you'd need a char and bchar type, while in python3 you don't (since bchar is just int). I can't believe I'm making an argument to do more work to maintain py2 compatibility in November of 2019, but providing `typing.Text` like compatibility classes for char/bchar (or char/unichar if you prefer) might be a nice thing. I'm really glad to see more discussion happening here :) Translating the changes from [1] to mypy/typeshed stubs is unfortunately not a straight copy-paste, but hopefully it provides a jumping off point. [0]: https://mail.python.org/archives/list/typing-sig@python.org/thread/ENTSMRILZ... [1]: https://github.com/google/pytype/commit/5cd7a15613b883b3ff6acdbaf0fdde032c85... On Fri, Nov 8, 2019 at 10:39 AM Brandt Bucher <brandtbucher@gmail.com> wrote:

Joshua Morton wrote:
I believe this is similar to a proposal I made here[0], and I fully support it. I'll see if I can make the changes public, but fwiw I was able to catch actual, real, bugs of this form with only some basic changes to pytype's stdlib and builtin type hints[1].
That's great to hear, I'd love to see some examples!
This is technically backwards incompatible. It will cause typechecking errors for currently working code, and probably a good bit of them. I'm not sure if there are norms for dealing with that with changes to mypy/typeshed, but I wanted to raise the question. (for example, could we hide this behavior behind a flag for a while, is that something we want to do?)
True, but the only errors mandated by the proposal in its current form are when a `str` (of any length) is assigned to a variable initialized by `str` iteration/indexing/`builtins.chr`/ext., or passed to a function like `builtins.ord`, which requires a `Chr`. I don't think these cases are all that common, and they are exactly the cases where I believe an explicit cast/annotation would likely be a benefit.
There's a strong opinion question of what methods the char type should support (at typecheck time). I'm of the opinion that it shouldn't support slicing, any of the find, partition, or split methods, nor iter/len/a few others. I presume others will disagree about some of these.
I think this is up to the type checker / configuration. I don't think mandating any Liskov violations make sense here (in other words, a `Chr` should have *all* of the inherited `str` methods, because it *is* a `str`).
2/3 is a consideration here. Personally, I've found that type checking is a super useful tool to help converting code to python3 and to avoid and fix str/bytes issues. However, in python2 you'd need a char and bchar type, while in python3 you don't (since bchar is just int). I can't believe I'm making an argument to do more work to maintain py2 compatibility in November of 2019, but providing `typing.Text` like compatibility classes for char/bchar (or char/unichar if you prefer) might be a nice thing.
Haven't given this much thought, but changing the definition to `NewType("Chr", Text)` and making the necessary typeshed updates should suffice, no?
participants (4)
-
Brandt Bucher
-
Guido van Rossum
-
Joshua Morton
-
Sebastian Rittau