[Python-ideas] struct.unpack should support open files

Steven D'Aprano steve at pearwood.info
Mon Dec 24 16:17:33 EST 2018

On Mon, Dec 24, 2018 at 03:36:07PM +0000, Paul Moore wrote:

> > There should be no difference whether the text comes from a literal, a
> > variable, or is read from a file.
> One difference is that with a file, it's (as far as I can see)
> impossible to determine whether or not you're going to get bytes or
> text without reading some data (and so potentially affecting the state
> of the file object).

Here are two ways: look at the type of the file object, or look at the 
mode of the file object:

py> f = open('/tmp/spam.binary', 'wb')
py> g = open('/tmp/spam.text', 'w')
py> type(f), type(g)
(<class '_io.BufferedWriter'>, <class '_io.TextIOWrapper'>)

py> f.mode, g.mode
('wb', 'w')

> This might be considered irrelevant 

Indeed :-)

> (personally,
> I don't see a problem with a function definition that says "parameter
> fd must be an object that has a read(length) method that returns
> bytes" - that's basically what duck typing is all about) but it *is* a
> distinguishing feature of files over in-memory data.

But it's not a distinguishing feature between the proposal, and writing:

unpack(fmt, f.read(size))

which will also read from the file and affect the file state before 
failing. So its a difference that makes no difference.

> There is also the fact that read() is only defined to return *at most*
> the requested number of bytes. Non-blocking reads and objects like
> pipes that can return additional data over time add extra complexity.

How do they add extra complexity?

According to the proposal, unpack() attempts the read. If it returns the 
correct number of bytes, the unpacking succeeds. If it doesn't, you get 
an exception, precisely the same way you would get an exception if you 
manually did the read and passed it to unpack().

Its the caller's responsibility to provide a valid file object. If your 
struct needs 10 bytes, and you provide a file that returns 6 bytes, you 
get an exception. There's no promise made that unpack() should repeat 
the read over and over again, hoping that its a pipe and more data 
becomes available. It either works with a single read, or it fails.

Just like similar APIs as those provided by pickle, json etc which 
provide load() and loads() functions.

In hindsight, the precedent set by pickle, json, etc suggests that we 
ought to have an unpack() function that reads from files and an 
unpacks() function that takes a string, but that ship has sailed.

> Again, not insoluble, and potentially simple enough to handle with
> "read N bytes, if you got something other than bytes or fewer than N
> of them, raise an error", but still enough that the special cases
> start to accumulate.

I can understand the argument that the benefit of this is trivial over 

    unpack(fmt, f.read(calcsize(fmt))

Unlike reading from a pickle or json record, its pretty easy to know how 
much to read, so there is an argument that this convenience method 
doesn't gain us much convenience.

But I'm just not seeing where all the extra complexity and special case 
handing is supposed to be, except by having unpack make promises that 
the OP didn't request:

- read partial structs from non-blocking files without failing
- deal with file system errors without failing
- support reading from text files when bytes are required without failing
- if an exception occurs, the state of the file shouldn't change

Those promises *would* add enormous amounts of complexity, but I don't 
think we need to make those promises. I don't think the OP wants them, 
I don't want them, and I don't think they are reasonable promises to 

> The suggestion is a nice convenience method, and probably a useful
> addition for the majority of cases where it would do exactly what was
> needed, but still not completely trivial to actually implement and
> document (if I were doing it, I'd go with the naive approach, and just
> raise a ValueError when read(N) returns anything other than N bytes,
> for what it's worth).

Indeed. Except that we should raise precisely the same exception type 
that struct.unpack() currently raises in the same circumstances:

py> struct.unpack("ddd", b"a")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
struct.error: unpack requires a bytes object of length 24

rather than ValueError.


More information about the Python-ideas mailing list