[Python-ideas] struct.unpack should support open files
Steven D'Aprano
steve at pearwood.info
Mon Dec 24 16:17:33 EST 2018
On Mon, Dec 24, 2018 at 03:36:07PM +0000, Paul Moore wrote:
> > There should be no difference whether the text comes from a literal, a
> > variable, or is read from a file.
>
> One difference is that with a file, it's (as far as I can see)
> impossible to determine whether or not you're going to get bytes or
> text without reading some data (and so potentially affecting the state
> of the file object).
Here are two ways: look at the type of the file object, or look at the
mode of the file object:
py> f = open('/tmp/spam.binary', 'wb')
py> g = open('/tmp/spam.text', 'w')
py> type(f), type(g)
(<class '_io.BufferedWriter'>, <class '_io.TextIOWrapper'>)
py> f.mode, g.mode
('wb', 'w')
> This might be considered irrelevant
Indeed :-)
> (personally,
> I don't see a problem with a function definition that says "parameter
> fd must be an object that has a read(length) method that returns
> bytes" - that's basically what duck typing is all about) but it *is* a
> distinguishing feature of files over in-memory data.
But it's not a distinguishing feature between the proposal, and writing:
unpack(fmt, f.read(size))
which will also read from the file and affect the file state before
failing. So its a difference that makes no difference.
> There is also the fact that read() is only defined to return *at most*
> the requested number of bytes. Non-blocking reads and objects like
> pipes that can return additional data over time add extra complexity.
How do they add extra complexity?
According to the proposal, unpack() attempts the read. If it returns the
correct number of bytes, the unpacking succeeds. If it doesn't, you get
an exception, precisely the same way you would get an exception if you
manually did the read and passed it to unpack().
Its the caller's responsibility to provide a valid file object. If your
struct needs 10 bytes, and you provide a file that returns 6 bytes, you
get an exception. There's no promise made that unpack() should repeat
the read over and over again, hoping that its a pipe and more data
becomes available. It either works with a single read, or it fails.
Just like similar APIs as those provided by pickle, json etc which
provide load() and loads() functions.
In hindsight, the precedent set by pickle, json, etc suggests that we
ought to have an unpack() function that reads from files and an
unpacks() function that takes a string, but that ship has sailed.
> Again, not insoluble, and potentially simple enough to handle with
> "read N bytes, if you got something other than bytes or fewer than N
> of them, raise an error", but still enough that the special cases
> start to accumulate.
I can understand the argument that the benefit of this is trivial over
unpack(fmt, f.read(calcsize(fmt))
Unlike reading from a pickle or json record, its pretty easy to know how
much to read, so there is an argument that this convenience method
doesn't gain us much convenience.
But I'm just not seeing where all the extra complexity and special case
handing is supposed to be, except by having unpack make promises that
the OP didn't request:
- read partial structs from non-blocking files without failing
- deal with file system errors without failing
- support reading from text files when bytes are required without failing
- if an exception occurs, the state of the file shouldn't change
Those promises *would* add enormous amounts of complexity, but I don't
think we need to make those promises. I don't think the OP wants them,
I don't want them, and I don't think they are reasonable promises to
make.
> The suggestion is a nice convenience method, and probably a useful
> addition for the majority of cases where it would do exactly what was
> needed, but still not completely trivial to actually implement and
> document (if I were doing it, I'd go with the naive approach, and just
> raise a ValueError when read(N) returns anything other than N bytes,
> for what it's worth).
Indeed. Except that we should raise precisely the same exception type
that struct.unpack() currently raises in the same circumstances:
py> struct.unpack("ddd", b"a")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
struct.error: unpack requires a bytes object of length 24
rather than ValueError.
--
Steve
More information about the Python-ideas
mailing list