PEP 688: Request for feedback on bytes/bytearray promotion
![](https://secure.gravatar.com/avatar/57da4d2e2a527026baaaab35e6872fa5.jpg?s=120&d=mm&r=g)
I recently revised PEP 688 (https://peps.python.org/pep-0688/), which proposes a mechanism to make the buffer protocol accessible to the type system. The most technically challenging part of the PEP is in the interaction with the C API, so I opened a primary discussion thread on the core dev Discourse at https://discuss.python.org/t/pep-688-take-2-making-the-buffer-protocol-acces... . However, the PEP also proposes a change that affects primarily static type checkers: removing the implicit promotion of memoryview and bytearray to bytes. For context, the CPython docs currently specify that a type annotation of "bytes" should also include bytearray and memoryview values, similar to how "float" implicitly includes int. Mypy and pyright implement this rule; pyre does not. I have a draft mypy PR removing the implicit promotion at https://github.com/python/mypy/pull/12661. The mypy-primer output shows that it is not uncommon for projects to rely on the implicit promotion. (Though it's worth noting that a maintainer of the most impacted project, psycopg, asked mypy to remove the implicit promotion because it caused mypy to miss bugs: https://github.com/python/mypy/issues/12643#issuecomment-1105914159.) Several people commented on Discourse that they feel it is more intuitive for newcomers if a bytes type annotation also includes bytearray. The more I think about it, the more I'm convinced that it's better in the long term to remove this implicit promotion. The most basic rule in Python typing is that if you put a type X in an annotation, only instances of that type are accepted. Exceptions to that rule are counterintuitive and lead to confusing edge cases. If you want either bytes or bytearray, you should write "bytes | bytearray"; if you want just bytes, you should write "bytes". But I'm willing to change the PEP if the consensus in the typing community is that it's better to keep the implicit bytes/bytearray promotion. What do you think? I'm less willing to keep accepting memoryview as compatible with bytes. bytearray really does have almost perfect interoperability with bytes (the main exception is that it's not hashable), but memoryview has little in common with bytes beyond being a buffer.
![](https://secure.gravatar.com/avatar/d5d40a40d75ba32b1bdff02b57201cd1.jpg?s=120&d=mm&r=g)
Am 27.10.22 um 03:31 schrieb Jelle Zijlstra:
Speaking with my typeshed maintainer hat: If we are removing that promotion, we need a transition strategy. Here's my suggestion for typeshed: 1. Introduce a TypeAlias `_typeshed.OldBytes = bytes | bytearray | memoryview` (or similar). 2. Document that `OldBytes` must not be used manually. 3. Programmatically replace all occurrences of "bytes" in argument types with `OldBytes`. (We can only do this for third-party stubs once all typecheckers include `_typeshed.OldBytes`. This should ideally happen, *before* typecheckers remove support for the automatic promotion.) 4. Check all occurrences of `OldBytes` manually and replace them with the appropriate types. This can be done over time when touching a particular stub file anyway. 5. Eventually (in a few years) remove `_typeshed.OldBytes`. - Sebastian
![](https://secure.gravatar.com/avatar/7b5bbadd9baf9c6b33a053e9687ce97e.jpg?s=120&d=mm&r=g)
Things I have a clear opinion on: I am strongly in favour of dropping memoryview implicit promotion. It's too unsound. Skimming the mypy_primer output, I think we can still fix some stuff in typeshed, e.g. https://github.com/python/typeshed/pull/8995 . The lower we get the cost, the less hard we have to think about removing implicit promotion. We should deprecate collections.abc.ByteString, which is currently useless. We could maybe retcon it to something that's useful (e.g. either the equivalent of `bytes & bytearray` or `Buffer & Sequence[int]`), but that's probably considered too breaking a change (not to mention isinstance for the whole `bytes & bytearray` interface would be slow). Things I am less clear on: A main reason to change the promotion status quo is to be able to reflect hashability. But there's no way to type whether a memoryview is hashable, or whether a Sequence[int] is hashable... in general I sort of view hashability as a bit of a losing cause in typing. I think "breaks the most basic rule of typing" overstates the case a little. Everyone uses int and float happily (and those are less compatible than bytes and bytearray). I.e. "instances of that type" is doing some heavy lifting in that sentence... for example, bytes promotion probably feels more intuitive than list's invariance. On Thu, 27 Oct 2022 at 01:08, Sebastian Rittau <srittau@rittau.biz> wrote:
![](https://secure.gravatar.com/avatar/57da4d2e2a527026baaaab35e6872fa5.jpg?s=120&d=mm&r=g)
El jue, 27 oct 2022 a las 1:08, Sebastian Rittau (<srittau@rittau.biz>) escribió:
I'm hoping we can avoid this dance by simply reviewing all `bytes` annotations now (following https://github.com/python/typeshed/issues/9001).
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Thu, Oct 27, 2022 at 1:08 AM Sebastian Rittau <srittau@rittau.biz> wrote:
Sounds like a good transition plan. If it's for internal use only, why not name it `_typeshed._OldBytes`? And what would be lost if even that dropped `memoryview`? (Probably the world would explode, never change something that's deprecated anyway no matter how much you want to. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
![](https://secure.gravatar.com/avatar/d5d40a40d75ba32b1bdff02b57201cd1.jpg?s=120&d=mm&r=g)
Am 27.10.22 um 03:31 schrieb Jelle Zijlstra:
Speaking with my typeshed maintainer hat: If we are removing that promotion, we need a transition strategy. Here's my suggestion for typeshed: 1. Introduce a TypeAlias `_typeshed.OldBytes = bytes | bytearray | memoryview` (or similar). 2. Document that `OldBytes` must not be used manually. 3. Programmatically replace all occurrences of "bytes" in argument types with `OldBytes`. (We can only do this for third-party stubs once all typecheckers include `_typeshed.OldBytes`. This should ideally happen, *before* typecheckers remove support for the automatic promotion.) 4. Check all occurrences of `OldBytes` manually and replace them with the appropriate types. This can be done over time when touching a particular stub file anyway. 5. Eventually (in a few years) remove `_typeshed.OldBytes`. - Sebastian
![](https://secure.gravatar.com/avatar/7b5bbadd9baf9c6b33a053e9687ce97e.jpg?s=120&d=mm&r=g)
Things I have a clear opinion on: I am strongly in favour of dropping memoryview implicit promotion. It's too unsound. Skimming the mypy_primer output, I think we can still fix some stuff in typeshed, e.g. https://github.com/python/typeshed/pull/8995 . The lower we get the cost, the less hard we have to think about removing implicit promotion. We should deprecate collections.abc.ByteString, which is currently useless. We could maybe retcon it to something that's useful (e.g. either the equivalent of `bytes & bytearray` or `Buffer & Sequence[int]`), but that's probably considered too breaking a change (not to mention isinstance for the whole `bytes & bytearray` interface would be slow). Things I am less clear on: A main reason to change the promotion status quo is to be able to reflect hashability. But there's no way to type whether a memoryview is hashable, or whether a Sequence[int] is hashable... in general I sort of view hashability as a bit of a losing cause in typing. I think "breaks the most basic rule of typing" overstates the case a little. Everyone uses int and float happily (and those are less compatible than bytes and bytearray). I.e. "instances of that type" is doing some heavy lifting in that sentence... for example, bytes promotion probably feels more intuitive than list's invariance. On Thu, 27 Oct 2022 at 01:08, Sebastian Rittau <srittau@rittau.biz> wrote:
![](https://secure.gravatar.com/avatar/57da4d2e2a527026baaaab35e6872fa5.jpg?s=120&d=mm&r=g)
El jue, 27 oct 2022 a las 1:08, Sebastian Rittau (<srittau@rittau.biz>) escribió:
I'm hoping we can avoid this dance by simply reviewing all `bytes` annotations now (following https://github.com/python/typeshed/issues/9001).
![](https://secure.gravatar.com/avatar/047f2332cde3730f1ed661eebb0c5686.jpg?s=120&d=mm&r=g)
On Thu, Oct 27, 2022 at 1:08 AM Sebastian Rittau <srittau@rittau.biz> wrote:
Sounds like a good transition plan. If it's for internal use only, why not name it `_typeshed._OldBytes`? And what would be lost if even that dropped `memoryview`? (Probably the world would explode, never change something that's deprecated anyway no matter how much you want to. :-) -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* <http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...>
participants (6)
-
Carl Meyer
-
Guido van Rossum
-
Jelle Zijlstra
-
Paul Bryan
-
Sebastian Rittau
-
Shantanu Jain