[Python-ideas] struct.unpack should support open files
cs at cskk.id.au
Wed Dec 26 18:02:09 EST 2018
On 26Dec2018 12:18, Andrew Svetlov <andrew.svetlov at gmail.com> wrote:
>On Wed, Dec 26, 2018 at 11:26 AM Steven D'Aprano <steve at pearwood.info>
>> On Wed, Dec 26, 2018 at 09:48:15AM +0200, Andrew Svetlov wrote:
>> > The perfect demonstration of io objects complexity.
>> > `stream.read(N)` can return None by spec if the file is non-blocking
>> > and have no ready data.
>> > Confusing but still possible and documented behavior.
>> Regardless, my point doesn't change. That has nothing to do with the
>> behaviour of unpack. If you pass a non-blocking file-like object which
>> returns None, you get exactly the same exception as if you wrote
>> unpack(fmt, f.read(size))
>> and the call to f.read returned None. Why is it unpack's responsibility
>> to educate the caller that f.read can return None?
>> > You need to repeat reads until collecting the value of enough size.
>> That's not what the OP has asked for, it isn't what the OP's code does,
>> and its not what I've suggested.
>> Do pickle and json block and repeat the read until they have a complete
>> object? I'm pretty sure they don't [...]
>> json is correct: if `read()` is called without argument it reads the
>content until EOF.
>But with size argument the is different for interactive and non-interactive
Oh, it is better than that. At the low level, even blocking streams can
return short reads - particularly serial streams like ttys and TCP
>RawIOBase and BufferedIOBase also have slightly different behavior for
>Restriction fp to BufferedIOBase looks viable though, but it is not a
>Also I'm thinking about type annotations in typeshed.
>Now the type is Union[array[int], bytes, bytearray, memoryview]
>Should it be Union[io.BinaryIO, array[int], bytes, bytearray,
And this is why I, personally, think augumenting struct.unpack and
json.read and a myriad of other arbitrary methods to accept both
file-like things and bytes is an open ended can of worms.
And it is why I wrote myself my CornuCopyBuffer class (see my other post
in this thread).
Its entire purpose is to wrap an iterable of bytes-like objects and do
all that work via convenient methods. And which has factory methods to
make these from files or other common things. Given a CornuCopyBuffer
S = struct('spec-here...')
sbuf = bfr.take(S.size)
result = S.unpack(sbuf)
Under the covers `bfr` take care of short "reads" (iteraion values) etc
in the underlying iterable. The return from .take is typically a
memoryview from `bfr`'s internal buffer - it is _always_ exactly `size`
bytes long if you don't pass short_ok=True, or it raises an exception.
And so on.
The point here is: make a class to get what you actually need, and
_don't_ stuff variable and hard to agree on extra semantics inside
multiple basic utility classes like struct.
For myself, the CornuCopyBuffer is now my universal interface to byte
streams (binary files, TCP connections, whatever) which need binary
parsing, and it has the methods and internal logic to provide that,
including presenting a simple read only file-like interface with read
and seek-forward, should I need to pass it to a file-expecting object.
Do it _once_, and don't megacomplicatise all the existing utility
Cameron Simpson <cs at cskk.id.au>
More information about the Python-ideas