[Python-ideas] struct.unpack should support open files

Steven D'Aprano steve at pearwood.info
Wed Dec 26 00:11:35 EST 2018

On Tue, Dec 25, 2018 at 01:28:02AM +0200, Andrew Svetlov wrote:

> The proposal can generate cryptic messages like
> `a bytes-like object is required, not 'NoneType'`

How will it generate such a message? That's not obvious to me.

The message doesn't seem cryptic to me. It seems perfectly clear: a 
bytes-like object is required, but you provided None instead.

The only thing which is sub-optimal is the use of "NoneType" (the name 
of the class) instead of None.

> To produce more informative exception text all mentioned cases should be
> handled:

Why should they? How are the standard exceptions not good enough? The 
standard library is full of implementations which use ducktyping, and if 
you pass a chicken instead of a duck you get errors like

    AttributeError: 'Chicken' object has no attribute 'bill'

Why isn't that good enough for this function too?

We already have a proof-of-concept implementation, given by the OP. 
Here is it again:

import io, struct
def unpackStruct(fmt, frm):
    if isinstance(frm, io.IOBase):
        return struct.unpack(fmt, frm.read(struct.calcsize(fmt)))
        return struct.unpack(fmt, frm)

Here's the sort of exceptions it generates. For brevity, I have cut the 
tracebacks down to only the final line:

py> unpackStruct("ddd", open("/tmp/spam", "w"))
io.UnsupportedOperation: not readable

Is that not clear enough? (This is not a rhetorical question.) In what 
way do you think that exception needs enhancing? It seems perfectly fine 
to me.

Here's another exception that may be fine as given. If the given file 
doesn't contain enough bytes to fill the struct, you get this:

py> __ = open("/tmp/spam", "wb").write(b"\x10")
py> unpackStruct("ddd", open("/tmp/spam", "rb"))
struct.error: unpack requires a bytes object of length 24

It might be *nice*, but hardly *necessary*, to re-word the error message 
to make it more obvious that we're reading from a file, but honestly 
that should be obvious from context. There are certainly worse error 
messages in Python.

Here is one exception which should be reworded:

py> unpackStruct("ddd", open("/tmp/spam", "r"))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in unpackStruct
TypeError: a bytes-like object is required, not 'str'

For production use, that should report that the file needs to be opened 
in binary mode, not text mode.

Likewise similar type errors should report "bytes-like or file-like" 

These are minor enhancements to exception reporting, and aren't what I 
consider to be adding complexity in any meaningful sense. Of course we 
should expect that library-quality functions will have more error 
checking and better error reporting than a simple utility function for 
you own use.

The OP's simple implementation is a five line function. Adding more 
appropriate error messages might, what? Triple it? That surely is an 
argument for *doing it right, once* in the standard library, rather than 
having people re-invent the wheel over and over.

def unpackStruct(fmt, frm):
    if isinstance(frm, io.IOBase):
        if isinstance(frm, io.TextIOBase):
            raise TypeError('file must be opened in binary mode, not text')
        n = struct.calcsize(fmt)
        value = frm.read(n)
        assert isinstance(value, bytes)
        if len(value) != n:
            raise ValueError(
                    'expected %d bytes but only got %d' 
                    % (n, len(value))
        return struct.unpack(fmt, value)
        return struct.unpack(fmt, frm)

I think this is a useful enhancement to unpack(). If we were designing 
the struct module from scratch today, we'd surely want unpack() to read 
from files and unpacks() to read from a byte-string, mirroring the API 
of json, pickle, and similar.

But given the requirement for backwards compatibility, we can't change 
the fact that unpack() works with byte-strings. So we can either add a 
new function, unpack_from_file() or simply make unpack() a generic 
function that accepts either a byte-like interface or a file-like 
interface. I vote for the generic function approach.

(Or do nothing, of course.)

So far, I'm not seeing any substantial arguments for why this isn't 
useful, or too difficult to implement.

If anything, the biggest argument against it is that it is too simple to 
bother with (but that argument would apply equally to the pickle and 
json APIs). "Not every ~~one~~ fifteen line function needs to be in 
the standard library."


More information about the Python-ideas mailing list