[Python-ideas] struct.unpack should support open files

Cameron Simpson cs at cskk.id.au
Wed Dec 26 18:02:09 EST 2018


On 26Dec2018 12:18, Andrew Svetlov <andrew.svetlov at gmail.com> wrote:
>On Wed, Dec 26, 2018 at 11:26 AM Steven D'Aprano <steve at pearwood.info>
>wrote:
>
>> On Wed, Dec 26, 2018 at 09:48:15AM +0200, Andrew Svetlov wrote:
>> > The perfect demonstration of io objects complexity.
>> > `stream.read(N)` can return None by spec if the file is non-blocking
>> > and have no ready data.
>> >
>> > Confusing but still possible and documented behavior.
>>
>> https://docs.python.org/3/library/io.html#io.RawIOBase.read
>>
>> Regardless, my point doesn't change. That has nothing to do with the
>> behaviour of unpack. If you pass a non-blocking file-like object which
>> returns None, you get exactly the same exception as if you wrote
>>
>>     unpack(fmt, f.read(size))
>>
>> and the call to f.read returned None. Why is it unpack's responsibility
>> to educate the caller that f.read can return None?
[...]
>> > You need to repeat reads until collecting the value of enough size.
>>
>> That's not what the OP has asked for, it isn't what the OP's code does,
>> and its not what I've suggested.
>>
>> Do pickle and json block and repeat the read until they have a complete
>> object? I'm pretty sure they don't [...]
>> json is correct: if `read()` is called without argument it reads the 
>> whole
>content until EOF.
>But with size argument the is different for interactive and non-interactive
>streams.

Oh, it is better than that. At the low level, even blocking streams can 
return short reads - particularly serial streams like ttys and TCP 
connections.

>RawIOBase and BufferedIOBase also have slightly different behavior for
>`.read()`.
>
>Restriction fp to BufferedIOBase looks viable though, but it is not a
>file-like object.
>
>Also I'm thinking about type annotations in typeshed.
>Now the type is Union[array[int], bytes, bytearray, memoryview]
>Should it be Union[io.BinaryIO, array[int], bytes, bytearray, 
>memoryview] ?

And this is why I, personally, think augumenting struct.unpack and 
json.read and a myriad of other arbitrary methods to accept both 
file-like things and bytes is an open ended can of worms.

And it is why I wrote myself my CornuCopyBuffer class (see my other post 
in this thread).

Its entire purpose is to wrap an iterable of bytes-like objects and do 
all that work via convenient methods. And which has factory methods to 
make these from files or other common things. Given a CornuCopyBuffer 
`bfr`:

    S = struct('spec-here...')
    sbuf = bfr.take(S.size)
    result = S.unpack(sbuf)

Under the covers `bfr` take care of short "reads" (iteraion values) etc 
in the underlying iterable. The return from .take is typically a 
memoryview from `bfr`'s internal buffer - it is _always_ exactly `size` 
bytes long if you don't pass short_ok=True, or it raises an exception.  
And so on.

The point here is: make a class to get what you actually need, and 
_don't_ stuff variable and hard to agree on extra semantics inside 
multiple basic utility classes like struct.

For myself, the CornuCopyBuffer is now my universal interface to byte 
streams (binary files, TCP connections, whatever) which need binary 
parsing, and it has the methods and internal logic to provide that, 
including presenting a simple read only file-like interface with read 
and seek-forward, should I need to pass it to a file-expecting object.

Do it _once_, and don't megacomplicatise all the existing utility 
classes.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Python-ideas mailing list