[Python-ideas] Type hints for text/binary data in Python 2+3 code
Wes Turner
wes.turner at gmail.com
Fri Mar 18 22:12:30 EDT 2016
This sounds like a more correct approach, thanks.
Looking at MarkupSafe (and, now, f-strings), would/will it be possible to
use Typing.Text as a base class for even-more abstract string types
("strypes") e.g. XML, XHTML, HTML4, HTML5, HTML5.1, SQL? There are implicit
casts and contextual adaptations/transformations (which MarkupSafe specs a
bit). (I've no real code here, just a general idea that we're not tracking
enough string metadata to be safe here)
On Mar 18, 2016 8:45 PM, "Andrey Vlasovskikh" <andrey.vlasovskikh at gmail.com>
wrote:
> With the addition of the comment-based syntax [3] for Python 2.7 + 3 in
> PEP 0484 having a Python 2/3 compatible way of adding type hints for text
> and binary values becomes important.
>
> Following the issue #1141 at the Mypy GitHub site [1], I've came up with a
> draft proposal based on those ideas that I'd like to discuss here.
>
>
> # Abstract
>
> This proposal contains recommendations on how to annotate text/binary data
> in
> newly added PEP 0484 comment-based type hints in order to make them Python
> 2/3
> compatible when the single-source approach to porting from 2 to 3 is used.
>
> It introduces a new type `typing.Text` that represents text data in both
> Python
> 2 and 3, deprecates `str -> unicode` promotion used in type checkers,
> suggests
> an approach for type checkers to find implicit conversion errors by
> tracking ASCII
> text/binary values, recommends that type checkers should warn about
> `unicode` in
> the 2+3 mode.
>
>
> # Rationale
>
> With the addition of the comment-based syntax for Python 2.7 + 3 having a
> Python
> 2/3 compatible way of annotating types of text and binary values becomes
> important. Currently having a single-source code base is the main approach
> to
> 2/3 compatibility, so it is highly desirable to have 2/3 compatible
> comment-based type hints that would help porting code from 2 to 2+3 to 3.
>
> While migrating their code from Python 2 to 3 users are most likely to
> discover
> the following types of text/binary errors (presumably, in the descending
> order
> of their frequency in typical code):
>
> 1. Implicit text/binary conversions removed in Python 3
> 2. Calling changed APIs that accept or return text/binary data
> 3. Calling removed/changed methods of text/binary types
> 4. Overriding special text/binary methods and using the related built-ins
> (`str()`, `repr()`, `unicode()`)
>
> Only the first two types of errors -- implicit conversions and calling
> changed
> text/binary APIs -- depend on being able to express the semantics of
> Python 2+3
> compatible text/binary interfaces using type hints.
>
> PEP 0484 doesn't contain any recommendations on how to document various
> typical
> cases in text/binary APIs in order to make type hints 2+3 compatible.
>
>
> # Proposal
>
> This document is based on some text/binary handling options and the
> problems
> associated with them propsed at python/mypy#1141 by Jukka Lehtosalo, Guido
> van
> Rossum, and others [1]. It also takes into account the experience of the
> PyCharm
> team with their pre-PEP484 notation for type hints [2] and handling Python
> 2/3
> issues reported by users in PyCharm code inspections.
>
>
> ## Handling removed implicit conversions
>
> In addition to the existing types (`bytes`, `str`, `unicode`,
> `typing.AnyStr`)
> let's introduce a new type for *2+3 compatible text data* -- `typing.Text`
> (should
> we add a fake built-in `unicode` type for Python 3 to type checkers
> instead of
> introducing a new name?):
>
> * `typing.Text`: text data
> * Python 2: `unicode`
> * Python 3: `str`
>
> Just to remind the semantics of the existing types:
>
> * `bytes`: binary data
> * Python 2: `bytes` (== `str`)
> * Python 3: `bytes`
> * `str`: "native" string, `type('foo')`
> * Python 2: `str`
> * Python 3: `str`
> * `unicode`: Python 2-only text data
> * Python 2: `unicode`
> * Python 3: error
> * `typing.AnyStr`: type variable constrained to both text and binary data
>
> With the addition of `typing.Text` it is possible to express the type
> analogous
> to `typing.AnyStr` that doesn't impose any type constraints (should we
> call it
> `typing.BaseString`?):
>
> * `typing.Union[typing.Text, bytes]`: both text and binary data when a type
> varibale isn't needed
>
> Using only `typing.Text`, `bytes`, `str`, and `typing.AnyStr` in the type
> hints
> for an API would mean that this API is Python 2 and 3 compatible in
> respect to
> implicit text/binary conversions.
>
> For Python 2 we should *not* have the implicit `str` -> `unicode` promotion
> since it hides errors related to implicit conversions.
>
> For 7-bit ASCII string literals in Python 2 type checkers should infer
> special
> internal types `typing._AsciiStr` and `typing._AsciiUnicode` that are
> compatible
> with both `str` and `unicode` (a *special type-checking rule* is needed):
>
> class _AsciiStr(str):
> pass
>
> class _AsciiUnicode(unicode):
> pass
>
> The details of inferring ASCII types are up to specific type checkers.
>
> In the 2+3 mode type checkers should show errors when comment- or stub-
> based
> type hints contain `unicode`.
>
>
> ## Examples of typical 2+3 functions
>
> A function that accepts "native" strings. It uses implicit ASCII
> unicode-to-str conversion at runtime in Python 2 and accepts only text
> data in
> Python 3:
>
> def getattr(o: Any, name: str, default: Any = None) -> Any: ...
>
> A function that does implicit str-to-unicode conversion at runtime in
> Python 2
> and accepts only text data in Python 3:
>
> def hello_rus(name: Text) -> Text:
> return u'Привет, ' + name
>
> A function that transforms text-to-text or binary-to-binary or handles
> both text
> and binary data in some other way in both Python 2 and 3:
>
> def listdir(path: AnyStr) -> AnyStr: ...
>
> A function that works with both text and binary data in Python 2 and 3,
> where
> the author of the function some reason doens't want to have a type variable
> associated with `AnyStr`:
>
> def upper_len(s: Union[bytes, Text]) -> int:
> return len(s.upper())
>
> A PEP-3333 compatible WSGI app function that uses "native" strings for
> environ
> and headers data while returning an iterable over binary data in both
> Python 2
> and 3:
>
> def app(environ: Dict[str, Any],
> start_response: Callable[[str, List[Tuple[str, str]]], None]) \
> -> Iterable[bytes]: ...
>
> A type inference example that features a type checker being able to infer
> `typing._AsciiStr` or `typing._AsciiUnicode` types for Python 2 using the
> functions defined above:
>
> method_name = u'update' # _AsciiUnicode
> getattr({}, method_name) # OK, implicit ASCII-only unicode-to-bytes
> in Py2
>
> nonascii_data = b'\xff' # _AsciiStr
> hello_rus(nonascii_data) # Type checker warning
> # Non-ASCII bytes are not compatible with
> Text
>
> # _AsciiUnicode + bytes
> u'foo' + b'\xff' # Type checker warning
> # Non-ASCII bytes are not compatible with
> Text
>
> def f(x: AnyStr, y: AnyStr) -> AnyStr:
> return os.path.join('base', x, y) # _AsciiStr compatible with
> AnyStr
> # since it's compatible with
> # both str and unicode
>
> There are cases mentioned in [1] where more advanced type inference rules
> are
> required in order to be able to handle ASCII types. It remains unclear if
> these rules would be easy enough to implement in type checkers.
>
>
> ## Handling other types of text/binary errors
>
> No new types besides `typing.Text` are needed in order to find errors of
> the
> other types of errors listed in the Rationale section.
>
> Based on the type hints that use the above text / binary types, type
> checkers
> in the 2+3 mode should show errors when the user accesses the attributes of
> these types not available in both Python 2 and Python 3.
>
>
> [1]: https://github.com/python/mypy/issues/1141
> [2]: https://github.com/JetBrains/python-skeletons#types
> [3]:
> https://www.python.org/dev/peps/pep-0484/#suggested-syntax-for-python-2-7-and-straddling-code
>
>
> --
> Andrey Vlasovskikh
>
> Web: http://pirx.ru/
>
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160318/13fbe030/attachment-0001.html>
More information about the Python-ideas
mailing list