[Python-ideas] Type hints for text/binary data in Python 2+3 code
Jukka Lehtosalo
jlehtosalo at gmail.com
Sat Mar 26 08:55:11 EDT 2016
On Fri, Mar 25, 2016 at 12:00 AM, Andrey Vlasovskikh <
andrey.vlasovskikh at gmail.com> wrote:
> Upon further investigation of the problem I've come up with an alternative
> idea that looks simpler and yet still capable of finding most text/binary
> conversion errors.
>
...
> ## TL;DR
>
> * Introduce `typing.Text` for text data in Python 2+3
> * `bytes`, `str`, `unicode`, `typing.Text` in type hints mean whatever they
> mean at runtime for Python 2 or 3
> * Allow `str -> unicode` and `unicode -> str` promotions for Python 2
>
I'm against this, as it would seem to make str and unicode pretty much the
same type in Python 2, and thus Python 2 mode seems much weaker than
necessary. I wrote a more detailed reply in the mypy issue tracker (
https://github.com/python/mypy/issues/1141#issuecomment-201799761). I'm not
copying it all here since much of that is somewhat mypy-specific and
related to the rest of the discussion on that issue, but I'll summarize my
main points here.
I prefer the idea of doing better type checking in Python 2 mode for str
and unicode, though I suspect we need to implement a prototype to decide
whether it will be practical.
* Type checking for Python 2 *and* Python 3 actually finds most text/binary
> errors
>
This may be true, but I'm worried about usability for Python 2 code bases.
Also, the effort needed to pass type checking in both modes (which is
likely pretty close to the effort of a full Python 3 migration, if the
entire code will be annotated) might be impractical for a large Python 2
code base.
## Summary for authors of type checkers
>
> The semantics of types `bytes`, `str`, `unicode`, `typing.Text` and the
> type
> checking rules for them should match the *runtime behavior* of these types
> in
> Python 2 and Python 3 depending on Python 2 or 3 modes. Using the runtime
> semantics for the types is easy to understand while it still allows to
> catch
> most errors. The Python 2+3 compatibility mode is just a sum of Python 2
> and
> Python 3 warnings.
>
At least for mypy, the Python 2+3 compatibility mode would likely that
twice as much CPU to run, which is a pretty high cost as type checking
speed is one of the biggest open issues we have right now.
## Runtime type compatibility
>
...
> Each cell contains two characters: the result in Python 2 and in Python 3
> respectively. Abbreviations:
>
...
> * `*` — types are compatible, ignoring implicit ASCII conversions
>
Am I reading this right if I understand this as "considered valid during
type checking but may fail at runtime"?
For non-ASCII text literals passed to functions that expect `Text` or `str`
> in
> Python 2 a type checker can analyze the contents of the literal and show
> additional warnings based on this information. For non-ASCII data coming
> from
> sources other than literals this check would be more complicated.
>
I wonder what would the check look like in the latter case? I can't imagine
how this would work for non-literals.
Jukka
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160326/edf8859a/attachment.html>
More information about the Python-ideas
mailing list