PEP 688: Request for feedback on bytes/bytearray promotion
I recently revised PEP 688 (https://peps.python.org/pep-0688/), which proposes a mechanism to make the buffer protocol accessible to the type system. The most technically challenging part of the PEP is in the interaction with the C API, so I opened a primary discussion thread on the core dev Discourse at https://discuss.python.org/t/pep-688-take-2-making-the-buffer-protocol-acces... . However, the PEP also proposes a change that affects primarily static type checkers: removing the implicit promotion of memoryview and bytearray to bytes. For context, the CPython docs currently specify that a type annotation of "bytes" should also include bytearray and memoryview values, similar to how "float" implicitly includes int. Mypy and pyright implement this rule; pyre does not. I have a draft mypy PR removing the implicit promotion at https://github.com/python/mypy/pull/12661. The mypy-primer output shows that it is not uncommon for projects to rely on the implicit promotion. (Though it's worth noting that a maintainer of the most impacted project, psycopg, asked mypy to remove the implicit promotion because it caused mypy to miss bugs: https://github.com/python/mypy/issues/12643#issuecomment-1105914159.) Several people commented on Discourse that they feel it is more intuitive for newcomers if a bytes type annotation also includes bytearray. The more I think about it, the more I'm convinced that it's better in the long term to remove this implicit promotion. The most basic rule in Python typing is that if you put a type X in an annotation, only instances of that type are accepted. Exceptions to that rule are counterintuitive and lead to confusing edge cases. If you want either bytes or bytearray, you should write "bytes | bytearray"; if you want just bytes, you should write "bytes". But I'm willing to change the PEP if the consensus in the typing community is that it's better to keep the implicit bytes/bytearray promotion. What do you think? I'm less willing to keep accepting memoryview as compatible with bytes. bytearray really does have almost perfect interoperability with bytes (the main exception is that it's not hashable), but memoryview has little in common with bytes beyond being a buffer.
FWIW, in my dynamic typing library bytes ≠ bytearray and you must express bytes | bytearry if you want to support both. On Wed, 2022-10-26 at 18:31 -0700, Jelle Zijlstra wrote:
I recently revised PEP 688 (https://peps.python.org/pep-0688/), which proposes a mechanism to make the buffer protocol accessible to the type system. The most technically challenging part of the PEP is in the interaction with the C API, so I opened a primary discussion thread on the core dev Discourse at https://discuss.python.org/t/pep-688-take-2-making-the-buffer-protocol-acces... .
However, the PEP also proposes a change that affects primarily static type checkers: removing the implicit promotion of memoryview and bytearray to bytes. For context, the CPython docs currently specify that a type annotation of "bytes" should also include bytearray and memoryview values, similar to how "float" implicitly includes int. Mypy and pyright implement this rule; pyre does not.
I have a draft mypy PR removing the implicit promotion at https://github.com/python/mypy/pull/12661. The mypy-primer output shows that it is not uncommon for projects to rely on the implicit promotion. (Though it's worth noting that a maintainer of the most impacted project, psycopg, asked mypy to remove the implicit promotion because it caused mypy to miss bugs: https://github.com/python/mypy/issues/12643#issuecomment-1105914159.) Several people commented on Discourse that they feel it is more intuitive for newcomers if a bytes type annotation also includes bytearray.
The more I think about it, the more I'm convinced that it's better in the long term to remove this implicit promotion. The most basic rule in Python typing is that if you put a type X in an annotation, only instances of that type are accepted. Exceptions to that rule are counterintuitive and lead to confusing edge cases. If you want either bytes or bytearray, you should write "bytes | bytearray"; if you want just bytes, you should write "bytes".
But I'm willing to change the PEP if the consensus in the typing community is that it's better to keep the implicit bytes/bytearray promotion. What do you think?
I'm less willing to keep accepting memoryview as compatible with bytes. bytearray really does have almost perfect interoperability with bytes (the main exception is that it's not hashable), but memoryview has little in common with bytes beyond being a buffer. _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: pbryan@anode.ca
On Wed, Oct 26, 2022 at 7:32 PM Jelle Zijlstra <jelle.zijlstra@gmail.com> wrote:
The most basic rule in Python typing is that if you put a type X in an annotation, only instances of that type are accepted. Exceptions to that rule are counterintuitive and lead to confusing edge cases. If you want either bytes or bytearray, you should write "bytes | bytearray"; if you want just bytes, you should write "bytes".
+1 Carl
Am 27.10.22 um 03:31 schrieb Jelle Zijlstra:
I recently revised PEP 688 (https://peps.python.org/pep-0688/), which proposes a mechanism to make the buffer protocol accessible to the type system. The most technically challenging part of the PEP is in the interaction with the C API, so I opened a primary discussion thread on the core dev Discourse at https://discuss.python.org/t/pep-688-take-2-making-the-buffer-protocol-acces....
However, the PEP also proposes a change that affects primarily static type checkers: removing the implicit promotion of memoryview and bytearray to bytes. For context, the CPython docs currently specify that a type annotation of "bytes" should also include bytearray and memoryview values, similar to how "float" implicitly includes int. Mypy and pyright implement this rule; pyre does not.
Speaking with my typeshed maintainer hat: If we are removing that promotion, we need a transition strategy. Here's my suggestion for typeshed: 1. Introduce a TypeAlias `_typeshed.OldBytes = bytes | bytearray | memoryview` (or similar). 2. Document that `OldBytes` must not be used manually. 3. Programmatically replace all occurrences of "bytes" in argument types with `OldBytes`. (We can only do this for third-party stubs once all typecheckers include `_typeshed.OldBytes`. This should ideally happen, *before* typecheckers remove support for the automatic promotion.) 4. Check all occurrences of `OldBytes` manually and replace them with the appropriate types. This can be done over time when touching a particular stub file anyway. 5. Eventually (in a few years) remove `_typeshed.OldBytes`. - Sebastian
Things I have a clear opinion on: I am strongly in favour of dropping memoryview implicit promotion. It's too unsound. Skimming the mypy_primer output, I think we can still fix some stuff in typeshed, e.g. https://github.com/python/typeshed/pull/8995 . The lower we get the cost, the less hard we have to think about removing implicit promotion. We should deprecate collections.abc.ByteString, which is currently useless. We could maybe retcon it to something that's useful (e.g. either the equivalent of `bytes & bytearray` or `Buffer & Sequence[int]`), but that's probably considered too breaking a change (not to mention isinstance for the whole `bytes & bytearray` interface would be slow). Things I am less clear on: A main reason to change the promotion status quo is to be able to reflect hashability. But there's no way to type whether a memoryview is hashable, or whether a Sequence[int] is hashable... in general I sort of view hashability as a bit of a losing cause in typing. I think "breaks the most basic rule of typing" overstates the case a little. Everyone uses int and float happily (and those are less compatible than bytes and bytearray). I.e. "instances of that type" is doing some heavy lifting in that sentence... for example, bytes promotion probably feels more intuitive than list's invariance. On Thu, 27 Oct 2022 at 01:08, Sebastian Rittau <srittau@rittau.biz> wrote:
Am 27.10.22 um 03:31 schrieb Jelle Zijlstra:
I recently revised PEP 688 (https://peps.python.org/pep-0688/), which proposes a mechanism to make the buffer protocol accessible to the type system. The most technically challenging part of the PEP is in the interaction with the C API, so I opened a primary discussion thread on the core dev Discourse at https://discuss.python.org/t/pep-688-take-2-making-the-buffer-protocol-acces....
However, the PEP also proposes a change that affects primarily static type checkers: removing the implicit promotion of memoryview and bytearray to bytes. For context, the CPython docs currently specify that a type annotation of "bytes" should also include bytearray and memoryview values, similar to how "float" implicitly includes int. Mypy and pyright implement this rule; pyre does not.
Speaking with my typeshed maintainer hat: If we are removing that promotion, we need a transition strategy. Here's my suggestion for typeshed:
1. Introduce a TypeAlias `_typeshed.OldBytes = bytes | bytearray | memoryview` (or similar). 2. Document that `OldBytes` must not be used manually. 3. Programmatically replace all occurrences of "bytes" in argument types with `OldBytes`. (We can only do this for third-party stubs once all typecheckers include `_typeshed.OldBytes`. This should ideally happen, *before* typecheckers remove support for the automatic promotion.) 4. Check all occurrences of `OldBytes` manually and replace them with the appropriate types. This can be done over time when touching a particular stub file anyway. 5. Eventually (in a few years) remove `_typeshed.OldBytes`.
- Sebastian _______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: hauntsaninja@gmail.com
El jue, 27 oct 2022 a las 1:08, Sebastian Rittau (<srittau@rittau.biz>) escribió:
Am 27.10.22 um 03:31 schrieb Jelle Zijlstra:
I recently revised PEP 688 (https://peps.python.org/pep-0688/), which proposes a mechanism to make the buffer protocol accessible to the type system. The most technically challenging part of the PEP is in the interaction with the C API, so I opened a primary discussion thread on the core dev Discourse at https://discuss.python.org/t/pep-688-take-2-making-the-buffer-protocol-acces....
However, the PEP also proposes a change that affects primarily static type checkers: removing the implicit promotion of memoryview and bytearray to bytes. For context, the CPython docs currently specify that a type annotation of "bytes" should also include bytearray and memoryview values, similar to how "float" implicitly includes int. Mypy and pyright implement this rule; pyre does not.
Speaking with my typeshed maintainer hat: If we are removing that promotion, we need a transition strategy. Here's my suggestion for typeshed:
1. Introduce a TypeAlias `_typeshed.OldBytes = bytes | bytearray | memoryview` (or similar). 2. Document that `OldBytes` must not be used manually. 3. Programmatically replace all occurrences of "bytes" in argument types with `OldBytes`. (We can only do this for third-party stubs once all typecheckers include `_typeshed.OldBytes`. This should ideally happen, *before* typecheckers remove support for the automatic promotion.) 4. Check all occurrences of `OldBytes` manually and replace them with the appropriate types. This can be done over time when touching a particular stub file anyway. 5. Eventually (in a few years) remove `_typeshed.OldBytes`.
- Sebastian
I'm hoping we can avoid this dance by simply reviewing all `bytes` annotations now (following https://github.com/python/typeshed/issues/9001).
_______________________________________________ Typing-sig mailing list -- typing-sig@python.org To unsubscribe send an email to typing-sig-leave@python.org https://mail.python.org/mailman3/lists/typing-sig.python.org/ Member address: jelle.zijlstra@gmail.com
On Thu, Oct 27, 2022 at 1:08 AM Sebastian Rittau <srittau@rittau.biz> wrote:
Speaking with my typeshed maintainer hat: If we are removing that promotion, we need a transition strategy. Here's my suggestion for typeshed:
1. Introduce a TypeAlias `_typeshed.OldBytes = bytes | bytearray | memoryview` (or similar). 2. Document that `OldBytes` must not be used manually. 3. Programmatically replace all occurrences of "bytes" in argument types with `OldBytes`. (We can only do this for third-party stubs once all typecheckers include `_typeshed.OldBytes`. This should ideally happen, *before* typecheckers remove support for the automatic promotion.) 4. Check all occurrences of `OldBytes` manually and replace them with the appropriate types. This can be done over time when touching a particular stub file anyway. 5. Eventually (in a few years) remove `_typeshed.OldBytes`.
Sounds like a good transition plan. If it's for internal use only, why not name it `_typeshed._OldBytes`? And what would be lost if even that dropped `memoryview`? (Probably the world would explode, never change something that's deprecated anyway no matter how much you want to. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
participants (6)
-
Carl Meyer
-
Guido van Rossum
-
Jelle Zijlstra
-
Paul Bryan
-
Sebastian Rittau
-
Shantanu Jain