[Python-ideas] struct.unpack should support open files

Andrew Svetlov andrew.svetlov at gmail.com
Mon Dec 24 18:28:02 EST 2018


The proposal can generate cryptic messages like
`a bytes-like object is required, not 'NoneType'`

To produce more informative exception text all mentioned cases should be
handled:

> - read partial structs from non-blocking files without failing
> - deal with file system errors without failing
> - support reading from text files when bytes are required without failing
> - if an exception occurs, the state of the file shouldn't change
I can add a couple of cases but the list is long enough for
demonstration purposes.

When a user calls
    unpack(fmt, f.read(calcsize(fmt))
the user is responsible for handling all edge cases (or ignore them most
likely).

If it is a part of a library -- robustness is the library responsibility.

On Mon, Dec 24, 2018 at 11:23 PM Steven D'Aprano <steve at pearwood.info>
wrote:

> On Mon, Dec 24, 2018 at 03:36:07PM +0000, Paul Moore wrote:
>
> > > There should be no difference whether the text comes from a literal, a
> > > variable, or is read from a file.
> >
> > One difference is that with a file, it's (as far as I can see)
> > impossible to determine whether or not you're going to get bytes or
> > text without reading some data (and so potentially affecting the state
> > of the file object).
>
> Here are two ways: look at the type of the file object, or look at the
> mode of the file object:
>
> py> f = open('/tmp/spam.binary', 'wb')
> py> g = open('/tmp/spam.text', 'w')
> py> type(f), type(g)
> (<class '_io.BufferedWriter'>, <class '_io.TextIOWrapper'>)
>
> py> f.mode, g.mode
> ('wb', 'w')
>
>
> > This might be considered irrelevant
>
> Indeed :-)
>
>
> > (personally,
> > I don't see a problem with a function definition that says "parameter
> > fd must be an object that has a read(length) method that returns
> > bytes" - that's basically what duck typing is all about) but it *is* a
> > distinguishing feature of files over in-memory data.
>
> But it's not a distinguishing feature between the proposal, and writing:
>
> unpack(fmt, f.read(size))
>
> which will also read from the file and affect the file state before
> failing. So its a difference that makes no difference.
>
>
> > There is also the fact that read() is only defined to return *at most*
> > the requested number of bytes. Non-blocking reads and objects like
> > pipes that can return additional data over time add extra complexity.
>
> How do they add extra complexity?
>
> According to the proposal, unpack() attempts the read. If it returns the
> correct number of bytes, the unpacking succeeds. If it doesn't, you get
> an exception, precisely the same way you would get an exception if you
> manually did the read and passed it to unpack().
>
> Its the caller's responsibility to provide a valid file object. If your
> struct needs 10 bytes, and you provide a file that returns 6 bytes, you
> get an exception. There's no promise made that unpack() should repeat
> the read over and over again, hoping that its a pipe and more data
> becomes available. It either works with a single read, or it fails.
>
> Just like similar APIs as those provided by pickle, json etc which
> provide load() and loads() functions.
>
> In hindsight, the precedent set by pickle, json, etc suggests that we
> ought to have an unpack() function that reads from files and an
> unpacks() function that takes a string, but that ship has sailed.
>
>
> > Again, not insoluble, and potentially simple enough to handle with
> > "read N bytes, if you got something other than bytes or fewer than N
> > of them, raise an error", but still enough that the special cases
> > start to accumulate.
>
> I can understand the argument that the benefit of this is trivial over
>
>     unpack(fmt, f.read(calcsize(fmt))
>
> Unlike reading from a pickle or json record, its pretty easy to know how
> much to read, so there is an argument that this convenience method
> doesn't gain us much convenience.
>
> But I'm just not seeing where all the extra complexity and special case
> handing is supposed to be, except by having unpack make promises that
> the OP didn't request:
>
> - read partial structs from non-blocking files without failing
> - deal with file system errors without failing
> - support reading from text files when bytes are required without failing
> - if an exception occurs, the state of the file shouldn't change
>
> Those promises *would* add enormous amounts of complexity, but I don't
> think we need to make those promises. I don't think the OP wants them,
> I don't want them, and I don't think they are reasonable promises to
> make.
>
>
> > The suggestion is a nice convenience method, and probably a useful
> > addition for the majority of cases where it would do exactly what was
> > needed, but still not completely trivial to actually implement and
> > document (if I were doing it, I'd go with the naive approach, and just
> > raise a ValueError when read(N) returns anything other than N bytes,
> > for what it's worth).
>
> Indeed. Except that we should raise precisely the same exception type
> that struct.unpack() currently raises in the same circumstances:
>
> py> struct.unpack("ddd", b"a")
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> struct.error: unpack requires a bytes object of length 24
>
> rather than ValueError.
>
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
Thanks,
Andrew Svetlov
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20181225/5aec90d6/attachment-0001.html>


More information about the Python-ideas mailing list