I believe this is similar to a proposal I made here[0], and I fully support it. I'll see if I can make the changes public, but fwiw I was able to catch actual, real, bugs of this form with only some basic changes to pytype's stdlib and builtin type hints[1]. That said, there are a few obstacles that are worth considering:

1. This is technically backwards incompatible. It will cause typechecking errors for currently working code, and probably a good bit of them. I'm not sure if there are norms for dealing with that with changes to mypy/typeshed, but I wanted to raise the question. (for example, could we hide this behavior behind a flag for a while, is that something we want to do?)

2. There's a strong opinion question of what methods the char type should support (at typecheck time). I'm of the opinion that it shouldn't support slicing, any of the find, partition, or split methods, nor iter/len/a few others. I presume others will disagree about some of these.

3. 2/3 is a consideration here. Personally, I've found that type checking is a super useful tool to help converting code to python3 and to avoid and fix str/bytes issues. However, in python2 you'd need a char and bchar type, while in python3 you don't (since bchar is just int). I can't believe I'm making an argument to do more work to maintain py2 compatibility in November of 2019, but providing `typing.Text` like compatibility classes for char/bchar (or char/unichar if you prefer) might be a nice thing. 

I'm really glad to see more discussion happening here :)

Translating the changes from [1] to mypy/typeshed stubs is unfortunately not a straight copy-paste, but hopefully it provides a jumping off point.

[0]: https://mail.python.org/archives/list/typing-sig@python.org/thread/ENTSMRILZN5YERQFSTWJXLDGX7KGH5DG/
[1]: https://github.com/google/pytype/commit/5cd7a15613b883b3ff6acdbaf0fdde032c85519c

On Fri, Nov 8, 2019 at 10:39 AM Brandt Bucher <brandtbucher@gmail.com> wrote:
Thanks for the helpful thoughts. A few replies:

Guido van Rossum wrote:
> One concern would seem that under the definition of `Chr = NewType("Chr", str)`, a string literal of length 1 does not have type `Chr`, so you'd have to modify all type checkers to make string literals of length 1 instances of `Chr` rather than `str`.

While this might be a good feature for type checkers, I don't consider it a necessary requirement of the proposal. I *think* it would be sufficient for `Chr`s to be produced only by string iteration/indexing (or possibly `builtins.chr` / `os.sep` / etc...), with an explicit cast to `Chr(c)` when needed for other strings (for example, when passing to `builtins.ord` or similar). This also avoids the `if ...: x = "a" ` / `else: x = "ab"` hiccup Sebastian brings up.

> You'd also probably have to arrange things so that `.lower()` on a `Chr` instance returns a `Chr` rather than a `str` (and so on for many string methods).

Agreed, I was envisioning a `TypeVar("Str", "str", Chr)` to annotate the `self` argument and return value for these on typeshed's `str`.

> Finally, in my experience (having given this some thought in the past) the "relative simplicity of the problem" is a fallacy -- the problem is anything but simple.

You're right, and I wasn't trying to make light of prior efforts in this area. I should have said something like "We probably don't need this much specialized machinery to get *most* of the benefit here."

Sebastian Rittau wrote:
> Personally I would prefer if Chr was a subtype of str, so that Chr can be used where str is expected, but not vice-versa. This would not need any "Union[str, Chr]" annotations and would be mostly compatible with current behavior.

Just to be clear, this is indeed what I am proposing with `Chr = NewType("Chr", str)`. `Chr` is a subtype of `str` with no additional functionality.
_______________________________________________
Typing-sig mailing list -- typing-sig@python.org
To unsubscribe send an email to typing-sig-leave@python.org
https://mail.python.org/mailman3/lists/typing-sig.python.org/