[Python-ideas] struct.unpack should support open files

Andrew Svetlov andrew.svetlov at gmail.com
Wed Dec 26 02:48:15 EST 2018


On Wed, Dec 26, 2018 at 7:12 AM Steven D'Aprano <steve at pearwood.info> wrote:

> On Tue, Dec 25, 2018 at 01:28:02AM +0200, Andrew Svetlov wrote:
>
> > The proposal can generate cryptic messages like
> > `a bytes-like object is required, not 'NoneType'`
>
> How will it generate such a message? That's not obvious to me.
>
> The message doesn't seem cryptic to me. It seems perfectly clear: a
> bytes-like object is required, but you provided None instead.
>
> The only thing which is sub-optimal is the use of "NoneType" (the name
> of the class) instead of None.
>
>
The perfect demonstration of io objects complexity.
`stream.read(N)` can return None by spec if the file is non-blocking
and have no ready data.

Confusing but still possible and documented behavior.


>
> > To produce more informative exception text all mentioned cases should be
> > handled:
>
> Why should they? How are the standard exceptions not good enough? The
> standard library is full of implementations which use ducktyping, and if
> you pass a chicken instead of a duck you get errors like
>
>     AttributeError: 'Chicken' object has no attribute 'bill'
>
> Why isn't that good enough for this function too?
>
> We already have a proof-of-concept implementation, given by the OP.
> Here is it again:
>
>
> import io, struct
> def unpackStruct(fmt, frm):
>     if isinstance(frm, io.IOBase):
>         return struct.unpack(fmt, frm.read(struct.calcsize(fmt)))
>     else:
>         return struct.unpack(fmt, frm)
>
>
> Here's the sort of exceptions it generates. For brevity, I have cut the
> tracebacks down to only the final line:
>
>
> py> unpackStruct("ddd", open("/tmp/spam", "w"))
> io.UnsupportedOperation: not readable
>
>
> Is that not clear enough? (This is not a rhetorical question.) In what
> way do you think that exception needs enhancing? It seems perfectly fine
> to me.
>
> Here's another exception that may be fine as given. If the given file
> doesn't contain enough bytes to fill the struct, you get this:
>
>
> py> __ = open("/tmp/spam", "wb").write(b"\x10")
> py> unpackStruct("ddd", open("/tmp/spam", "rb"))
> struct.error: unpack requires a bytes object of length 24
>
>
> It might be *nice*, but hardly *necessary*, to re-word the error message
> to make it more obvious that we're reading from a file, but honestly
> that should be obvious from context. There are certainly worse error
> messages in Python.
>
> Here is one exception which should be reworded:
>
> py> unpackStruct("ddd", open("/tmp/spam", "r"))
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
>   File "<stdin>", line 3, in unpackStruct
> TypeError: a bytes-like object is required, not 'str'
>
> For production use, that should report that the file needs to be opened
> in binary mode, not text mode.
>
> Likewise similar type errors should report "bytes-like or file-like"
> object.
>
> These are minor enhancements to exception reporting, and aren't what I
> consider to be adding complexity in any meaningful sense. Of course we
> should expect that library-quality functions will have more error
> checking and better error reporting than a simple utility function for
> you own use.
>
>
> The OP's simple implementation is a five line function. Adding more
> appropriate error messages might, what? Triple it? That surely is an
> argument for *doing it right, once* in the standard library, rather than
> having people re-invent the wheel over and over.
>
>
> def unpackStruct(fmt, frm):
>     if isinstance(frm, io.IOBase):
>         if isinstance(frm, io.TextIOBase):
>             raise TypeError('file must be opened in binary mode, not text')
>         n = struct.calcsize(fmt)
>         value = frm.read(n)
>         assert isinstance(value, bytes)
>         if len(value) != n:
>             raise ValueError(
>                     'expected %d bytes but only got %d'
>                     % (n, len(value))
>                     )
>         return struct.unpack(fmt, value)
>     else:
>         return struct.unpack(fmt, frm)
>
> You need to repeat reads until collecting the value of enough size.
`.read(N)` can return less bytes by definition,
that's true starting from very low-level read(2) syscall.

Otherwise a (low) change of broken code with very non-obvious error message
exists.


>
> I think this is a useful enhancement to unpack(). If we were designing
> the struct module from scratch today, we'd surely want unpack() to read
> from files and unpacks() to read from a byte-string, mirroring the API
> of json, pickle, and similar.
>
> But given the requirement for backwards compatibility, we can't change
> the fact that unpack() works with byte-strings. So we can either add a
> new function, unpack_from_file() or simply make unpack() a generic
> function that accepts either a byte-like interface or a file-like
> interface. I vote for the generic function approach.
>
> (Or do nothing, of course.)
>
> So far, I'm not seeing any substantial arguments for why this isn't
> useful, or too difficult to implement.
>
> If anything, the biggest argument against it is that it is too simple to
> bother with (but that argument would apply equally to the pickle and
> json APIs). "Not every ~~one~~ fifteen line function needs to be in
> the standard library."
>
>
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Thanks,
Andrew Svetlov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181226/36182e9a/attachment.html>


More information about the Python-ideas mailing list