[Python-ideas] Type hints for text/binary data in Python 2+3 code
Guido van Rossum
guido at python.org
Sat Mar 19 14:51:40 EDT 2016
I like the way this is going. I think it needs to be a separate PEP;
PEP 484 is already too long and this topic deserves being written up
carefully (like you have done here).
I have a few remarks.
* Do we really need _AsciiUnicode? I see the point of _AsciiStr,
because Python 2 accepts 'x' + u'' but fails '\xff' + u'', so 'x'
needs to be of type _AsciiStr while '\xff' should not (it should be
just str). However there's no difference in how u'x' is treated from
how u'\u1234' or u'\xff' are treated -- none of them can be
concatenated to '\xff' and all of them can be concatenated to _'x'.
* It would be helpful to spell out exactly what is and isn't allowed
when different core types (bytes, str, unicode, Text) meet in Python 2
and in Python 3. Something like a table with a row and a column for
each and the type of x+y (or "error") in each of the cells.
* I propose that Python 2+3 mode is just the intersection of what
Python 2 and Python 3 mode allow. (In mypy, I don't think we'll
implement this -- users will just have to run mypy twice with and
without --py2. But for PyCharm it makes sense to be able to declare
this. Yet I think it would be good not to have to spell out separately
which rules it uses, defining it as the intersection of 2 and 3 is all
we need.
On Fri, Mar 18, 2016 at 6:45 PM, Andrey Vlasovskikh
<andrey.vlasovskikh at gmail.com> wrote:
> With the addition of the comment-based syntax [3] for Python 2.7 + 3 in PEP 0484 having a Python 2/3 compatible way of adding type hints for text and binary values becomes important.
>
> Following the issue #1141 at the Mypy GitHub site [1], I've came up with a draft proposal based on those ideas that I'd like to discuss here.
>
>
> # Abstract
>
> This proposal contains recommendations on how to annotate text/binary data in
> newly added PEP 0484 comment-based type hints in order to make them Python 2/3
> compatible when the single-source approach to porting from 2 to 3 is used.
>
> It introduces a new type `typing.Text` that represents text data in both Python
> 2 and 3, deprecates `str -> unicode` promotion used in type checkers, suggests
> an approach for type checkers to find implicit conversion errors by tracking ASCII
> text/binary values, recommends that type checkers should warn about `unicode` in
> the 2+3 mode.
>
>
> # Rationale
>
> With the addition of the comment-based syntax for Python 2.7 + 3 having a Python
> 2/3 compatible way of annotating types of text and binary values becomes
> important. Currently having a single-source code base is the main approach to
> 2/3 compatibility, so it is highly desirable to have 2/3 compatible
> comment-based type hints that would help porting code from 2 to 2+3 to 3.
>
> While migrating their code from Python 2 to 3 users are most likely to discover
> the following types of text/binary errors (presumably, in the descending order
> of their frequency in typical code):
>
> 1. Implicit text/binary conversions removed in Python 3
> 2. Calling changed APIs that accept or return text/binary data
> 3. Calling removed/changed methods of text/binary types
> 4. Overriding special text/binary methods and using the related built-ins
> (`str()`, `repr()`, `unicode()`)
>
> Only the first two types of errors -- implicit conversions and calling changed
> text/binary APIs -- depend on being able to express the semantics of Python 2+3
> compatible text/binary interfaces using type hints.
>
> PEP 0484 doesn't contain any recommendations on how to document various typical
> cases in text/binary APIs in order to make type hints 2+3 compatible.
>
>
> # Proposal
>
> This document is based on some text/binary handling options and the problems
> associated with them propsed at python/mypy#1141 by Jukka Lehtosalo, Guido van
> Rossum, and others [1]. It also takes into account the experience of the PyCharm
> team with their pre-PEP484 notation for type hints [2] and handling Python 2/3
> issues reported by users in PyCharm code inspections.
>
>
> ## Handling removed implicit conversions
>
> In addition to the existing types (`bytes`, `str`, `unicode`, `typing.AnyStr`)
> let's introduce a new type for *2+3 compatible text data* -- `typing.Text` (should
> we add a fake built-in `unicode` type for Python 3 to type checkers instead of
> introducing a new name?):
>
> * `typing.Text`: text data
> * Python 2: `unicode`
> * Python 3: `str`
>
> Just to remind the semantics of the existing types:
>
> * `bytes`: binary data
> * Python 2: `bytes` (== `str`)
> * Python 3: `bytes`
> * `str`: "native" string, `type('foo')`
> * Python 2: `str`
> * Python 3: `str`
> * `unicode`: Python 2-only text data
> * Python 2: `unicode`
> * Python 3: error
> * `typing.AnyStr`: type variable constrained to both text and binary data
>
> With the addition of `typing.Text` it is possible to express the type analogous
> to `typing.AnyStr` that doesn't impose any type constraints (should we call it
> `typing.BaseString`?):
>
> * `typing.Union[typing.Text, bytes]`: both text and binary data when a type
> varibale isn't needed
>
> Using only `typing.Text`, `bytes`, `str`, and `typing.AnyStr` in the type hints
> for an API would mean that this API is Python 2 and 3 compatible in respect to
> implicit text/binary conversions.
>
> For Python 2 we should *not* have the implicit `str` -> `unicode` promotion
> since it hides errors related to implicit conversions.
>
> For 7-bit ASCII string literals in Python 2 type checkers should infer special
> internal types `typing._AsciiStr` and `typing._AsciiUnicode` that are compatible
> with both `str` and `unicode` (a *special type-checking rule* is needed):
>
> class _AsciiStr(str):
> pass
>
> class _AsciiUnicode(unicode):
> pass
>
> The details of inferring ASCII types are up to specific type checkers.
>
> In the 2+3 mode type checkers should show errors when comment- or stub- based
> type hints contain `unicode`.
>
>
> ## Examples of typical 2+3 functions
>
> A function that accepts "native" strings. It uses implicit ASCII
> unicode-to-str conversion at runtime in Python 2 and accepts only text data in
> Python 3:
>
> def getattr(o: Any, name: str, default: Any = None) -> Any: ...
>
> A function that does implicit str-to-unicode conversion at runtime in Python 2
> and accepts only text data in Python 3:
>
> def hello_rus(name: Text) -> Text:
> return u'Привет, ' + name
>
> A function that transforms text-to-text or binary-to-binary or handles both text
> and binary data in some other way in both Python 2 and 3:
>
> def listdir(path: AnyStr) -> AnyStr: ...
>
> A function that works with both text and binary data in Python 2 and 3, where
> the author of the function some reason doens't want to have a type variable
> associated with `AnyStr`:
>
> def upper_len(s: Union[bytes, Text]) -> int:
> return len(s.upper())
>
> A PEP-3333 compatible WSGI app function that uses "native" strings for environ
> and headers data while returning an iterable over binary data in both Python 2
> and 3:
>
> def app(environ: Dict[str, Any],
> start_response: Callable[[str, List[Tuple[str, str]]], None]) \
> -> Iterable[bytes]: ...
>
> A type inference example that features a type checker being able to infer
> `typing._AsciiStr` or `typing._AsciiUnicode` types for Python 2 using the
> functions defined above:
>
> method_name = u'update' # _AsciiUnicode
> getattr({}, method_name) # OK, implicit ASCII-only unicode-to-bytes in Py2
>
> nonascii_data = b'\xff' # _AsciiStr
> hello_rus(nonascii_data) # Type checker warning
> # Non-ASCII bytes are not compatible with Text
>
> # _AsciiUnicode + bytes
> u'foo' + b'\xff' # Type checker warning
> # Non-ASCII bytes are not compatible with Text
>
> def f(x: AnyStr, y: AnyStr) -> AnyStr:
> return os.path.join('base', x, y) # _AsciiStr compatible with AnyStr
> # since it's compatible with
> # both str and unicode
>
> There are cases mentioned in [1] where more advanced type inference rules are
> required in order to be able to handle ASCII types. It remains unclear if
> these rules would be easy enough to implement in type checkers.
>
>
> ## Handling other types of text/binary errors
>
> No new types besides `typing.Text` are needed in order to find errors of the
> other types of errors listed in the Rationale section.
>
> Based on the type hints that use the above text / binary types, type checkers
> in the 2+3 mode should show errors when the user accesses the attributes of
> these types not available in both Python 2 and Python 3.
>
>
> [1]: https://github.com/python/mypy/issues/1141
> [2]: https://github.com/JetBrains/python-skeletons#types
> [3]: https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code
>
>
> --
> Andrey Vlasovskikh
>
> Web: http://pirx.ru/
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
--
--Guido van Rossum (python.org/~guido)
More information about the Python-ideas
mailing list