Why does _pyio.*.readinto have to work with 'b' arrays?

Hello, The _pyio.BufferedIOBase class contains the following hack to make sure that you can read-into array objects with format 'b': try: b[:n] = data except TypeError as err: import array if not isinstance(b, array.array): raise err b[:n] = array.array('b', data) I am now wondering if I should implement the same hack in BufferedReader (cf. issue 20578). Is there anything special about 'b' arrays that justifies to treat them this way? Note that readinto is supposed to work with any object implementing the buffer protocol, but the Python implementation only works with bytearrays and (with the above hack) 'b' arrays. Even using a 'B' array fails:
import _pyio from array import array buf = array('b', b'x' * 10) _pyio.open('/dev/zero', 'rb').readinto(buf) 10 buf = array('B', b'x' * 10) _pyio.open('/dev/zero', 'rb').readinto(buf) Traceback (most recent call last): File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 662, in readinto b[:n] = data TypeError: can only assign array (not "bytes") to array slice
During handling of the above exception, another exception occurred: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/nikratio/clones/cpython/Lib/_pyio.py", line 667, in readinto b[:n] = array.array('b', data) TypeError: bad argument type for built-in operation It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.

On 15 June 2014 10:41, Benjamin Peterson <benjamin@python.org> wrote:
On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.
_pyio was written before the various memoryview fixes that were implemented in Python 3.3 - it seems to me it would make more sense to use memoryview to correctly handle arbitrary buffer exporters (we implemented similar fixes for the base64 module in 3.4). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 06/14/2014 09:31 PM, Nick Coghlan wrote:
On 15 June 2014 10:41, Benjamin Peterson <benjamin@python.org> wrote:
On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.
_pyio was written before the various memoryview fixes that were implemented in Python 3.3 - it seems to me it would make more sense to use memoryview to correctly handle arbitrary buffer exporters (we implemented similar fixes for the base64 module in 3.4).
Definitely. But is there a way to do that without writing C code? My attempts failed:
from array import array a = array('b', b'x'*10) am = memoryview(a) am[:3] = b'foo' Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryview assignment: lvalue and rvalue have different structures am[:3] = memoryview(b'foo') Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: memoryview assignment: lvalue and rvalue have different structures am.format = 'B' Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: attribute 'format' of 'memoryview' objects is not writable
The only thing that works is:
am[:3] = array('b', b'foo')
but that's again specific to a being a 'b'-array. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

On 15 June 2014 14:57, Nikolaus Rath <Nikolaus@rath.org> wrote:
On 06/14/2014 09:31 PM, Nick Coghlan wrote:
On 15 June 2014 10:41, Benjamin Peterson <benjamin@python.org> wrote:
On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.
_pyio was written before the various memoryview fixes that were implemented in Python 3.3 - it seems to me it would make more sense to use memoryview to correctly handle arbitrary buffer exporters (we implemented similar fixes for the base64 module in 3.4).
Definitely. But is there a way to do that without writing C code?
Yes, Python level reshaping and typecasting of memory views is one of the key enhancements Stefan implemented for 3.3.
from array import array a = array('b', b'x'*10) am = memoryview(a) a array('b', [120, 120, 120, 120, 120, 120, 120, 120, 120, 120]) am[:3] = memoryview(b'foo').cast('b') a array('b', [102, 111, 111, 120, 120, 120, 120, 120, 120, 120])
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Nick Coghlan <ncoghlan@gmail.com> writes:
On 15 June 2014 14:57, Nikolaus Rath <Nikolaus@rath.org> wrote:
On 06/14/2014 09:31 PM, Nick Coghlan wrote:
On 15 June 2014 10:41, Benjamin Peterson <benjamin@python.org> wrote:
On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.
_pyio was written before the various memoryview fixes that were implemented in Python 3.3 - it seems to me it would make more sense to use memoryview to correctly handle arbitrary buffer exporters (we implemented similar fixes for the base64 module in 3.4).
Definitely. But is there a way to do that without writing C code?
Yes, Python level reshaping and typecasting of memory views is one of the key enhancements Stefan implemented for 3.3. [..]
Ah, nice. I'll use that. Thank you Stefan :-). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«

Le 15 juin 2014 02:42, "Benjamin Peterson" <benjamin@python.org> a écrit :
On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.
Which types are accepted by the readinto() method of the C io module? If the C module only accepts bytearray, the array hack must be removed from _pyio. The _pyio module is mostly used for testing purpose, it's much slower. I hope that nobody uses it in production, the module is private (underscore prefix). So it's fine to break backward compatibilty to have the same behaviour then the C module. Victor

Victor Stinner <victor.stinner@gmail.com> writes:
Le 15 juin 2014 02:42, "Benjamin Peterson" <benjamin@python.org> a écrit :
On Sat, Jun 14, 2014, at 15:39, Nikolaus Rath wrote:
It seems to me that a much cleaner solution would be to simply declare _pyio's readinto to only work with bytearrays, and to explicitly raise a (more helpful) TypeError if anything else is passed in.
That seems reasonable. I don't think _pyio's behavior is terribly important compared to the C _io module.
Which types are accepted by the readinto() method of the C io module?
Everything implementing the buffer protocol.
If the C module only accepts bytearray, the array hack must be removed from _pyio.
_pyio currently accepts only bytearray and 'b'-type arrays. But it seems with memoryview.cast() we now have a way to make it behave like the C module. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
participants (4)
-
Benjamin Peterson
-
Nick Coghlan
-
Nikolaus Rath
-
Victor Stinner