I was wondering what our plan was to allow use of Literal types to make more expressive stubs for the standard library.
Now that PEP 586 is accepted, is it time for us to update the stubs for open, etc.?
I know there was previously some discussion about ways to resolve str vs.
Iterable[str] sometimes causing issues (
https://github.com/python/typing/issues/256), and I wanted to bring it up
again because I have a suggestion that appears to have not been explored
significantly in the discussions.
A lot of the requests are something like "special case length-1 strings at
runtime", which is tricky.
But in that issue, a few people propose another option: have str.__iter__
return a type that isn't str. This returned type should itself be
non-iterable. I think that alternative didn't quite get explored enough, so
I wanted to bring it up for more discussion.
This has some nice properties compared to a lot of the other suggestions
that make it easier to implement: it requires no changes to
python-the-language, and even requires no changes to MyPy or other
type-checkers, it only (I think) requires changes to type stubs for the
standard library and builtins. Whether this stricter behavior should hide
behind a flag is another question that I don't address.
So the problem is that, given a function like
def many_strs(strs: Sequence[str]) -> str:
the call `many_strs('aaa')` will typecheck successfully, even though its
exceedingly unlikely, given the annotations, that this is behavior the user
wanted or expected.
So let's add a new type, which I'll refer to as "OpaqueStr", since it's
meant to be non-iterable and non-container-like (since most instances are
just going to be a length-1 string from str.__iter__). It should only
implement a small subset of the str methods (capitalize, add, join, upper,
lower, all of the isXYZ), but not any of the container like ones: no
__iter__, no __len__, no strip, translate, format, find, etc. Those don't
make sense for a type like this.
Similarly, other functions, like ord, str.join, str.format, etc. need to be
updated to also accept an OpaqueStr. There's also unicode related changes,
and some additional type-plumbing that I haven't encountered. Auditing
current places where `str` is accepted, and replacing them with `str or
OpaqueStr` is probably the most annoying thing about this, but it forces
people to opt-in to the ambiguous behavior.
The result of this is that as soon as you iterate over a string, you get an
OpaqueStr, which isn't in the normal str hierarchy, it's unrelated (though
may be `basestring`? in python2). So attempts to iterate over or use the
OpaqueStr in non-intuitive ways fail. Similarly, since str is now defined
as Iterable/Container[OpaqueStr], instead of Iterable[str], you avoid the
recursive type problem.
If this sounds relatively concrete, it's because I wrote an initial
implementation after encountering this problem, that basically follows the
design I outlined above. It's by no means perfect (likely erring on the
side of false-negatives). I was also able to run it on a lot of the code at
Google. Here's what I found:
- The only false positives I saw were from functions that could return
either str or Iterable[str], based on, say, a flag or another argument. I
may have missed some, there were some suspicious cases, but most of those
were in situations where I couldn't obviously say that
- The breakages were, I think, a mostly even mix of logical issues in the
code, and misleading type annotations, for example, I saw a function that
could only ever return `False` due to a logical error, and this revealed
it, but similarly, I saw logically corrected functions with bad
annotations. One example was (paraphrased, in a class)
def _GetThing(self) -> str:
x, y = self.GetThing()
the subclass correctly overrode the type signature of _GetThing to return
- Mis-typing Dict[str, List[str]] as Dict[str, str] is weirdly common, I
think this was the cause of maybe a third of the changes I had to make,
although it's hard to say with certainty, since in many cases I just
disabled the type errors.
- I ran this over a bunch of code (not sure exactly what I can say, but it
was a lot), and while I haven't finished all of them, it looks like there
are, maybe, 100 locations that need updating, including within the standard
library (re, hexlify, etc.) and open source libraries that I could detect,
as well as within Google's python code. Suppressing the warnings took me
all of a few hours, fixing them will obviously take longer, but much of
that is because the code in question is subtly broken and needs fixing.
tl;dr: Add OpaqueStr, which is like str, but not a container. str.__iter__
returns OpaqueStr. This catches bugs.
I'm looking forward to any feedback or thoughts you all have. Thanks,