[Python-ideas] struct.unpack should support open files
steve at pearwood.info
Wed Dec 26 19:42:30 EST 2018
On Wed, Dec 26, 2018 at 01:32:38PM +0000, Paul Moore wrote:
> On Wed, 26 Dec 2018 at 09:26, Steven D'Aprano <steve at pearwood.info> wrote:
> > Regardless, my point doesn't change. That has nothing to do with the
> > behaviour of unpack. If you pass a non-blocking file-like object which
> > returns None, you get exactly the same exception as if you wrote
> > unpack(fmt, f.read(size))
> > and the call to f.read returned None. Why is it unpack's responsibility
> > to educate the caller that f.read can return None?
> Abstraction, basically - once the unpack function takes responsibility
> for doing the read, and hiding the fact that there's a read going on
> behind an API unpack(fmt, f), it *also* takes on responsibility for
> managing all of the administration of that read call.
As I keep pointing out, the json.load and pickle.load functions don't
take on all that added administration. Neither does marshal, or
zipfile, and I daresay there are others.
Why does "abstraction" apply to this proposal but not the others?
If you pass a file-like object to marshal.load that returns less than a
full record, it simply raises an exception. There's no attempt to handle
non-blocking streams and re-read until it has a full record:
py> class MyFile:
... def read(self, n=-1):
... return marshal.dumps([1, "a"])[:5]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
EOFError: EOF read where object expected
The use-case for marshall.load is to read a valid, complete marshall
record from a file on disk. Likewise for json.load and pickle.load.
There's no need to complicate the implementation by handling streams
from ttys and other exotic file-like objects.
Likewise there's zipfile, which also doesn't take on this extra
responsibility. It doesn't try to support non-blocking streams which
return None, for example. It assumes the input file is seekable, and
doesn't raise a dedicated error for the case that it isn't. Nor does it
support non-blocking streams by looping until it has read the data it
The use-case for unpack with a file object argument is the same. Why
should we demand that it alone take on this unnecessary, unwanted,
unused extra responsibility?
It seems to me that only people insisting that unpack() take on this
extra responsibility are those who are opposed to the proposal. We're
asking for a battery, and they're insisting that we actually need a
nuclear reactor, and rejecting the proposal because nuclear reactors are
too complex. Here are some of the features that have been piled on to
- you need to deal with non-blocking streams that return None;
- if you read an incomplete struct, you need to block and read
in a loop until the struct is complete;
- you need to deal with OS errors in some unspecified way, apart from
just letting them bubble up to the caller.
The response to all of these are:
No we don't need to do these things, they are all out of scope for the
proposal and other similar functions in the standard library don't do
them. These are examples of over-engineering and YAGNI.
*If* (a very big if!) somebody requests these features in the future,
then they'll be considered as enhancement requests. The effort required
versus the benefit will be weighed up, and if the benefit exceeds the
costs, then the function may be enhanced to support streams which return
The benefit will need to be more than just "abstraction".
If there are objective, rational reasons for unpack() taking on these
extra responsibilities, when other stdlib code doesn't, then I wish
people would explain what those reasons are. Why does "abstraction"
apply to struct.unpack() but not json.load()?
I'm willing to be persuaded, I can change my mind. When Andrew suggested
that unpack would need extra code to generate better error messages, I
tested a few likely exceptions, and ended up agreeing that at least one
and possibly two such enhancements were genuinely necessary. Those
better error messages ended up in my subsequent proof-of-concept
implementations, tripling the size from five lines to fifteen. (A second
implementation reduced it to twelve.)
But it irks me when people unnecessarily demand that new proposals are
written to standards far beyond what the rest of the stdlib is written
to. (I'm not talking about some of the venerable old, crufty parts of
the stdlib dating back to Python 1.4, I'm talking about actively
maintained, modern parts like json.)
Especially when they seem unwilling or unable to explain *why* we need
to apply such a high standard. What's so specially about unpack() that
it has to handle these additional use-cases?
If an objection to a proposal equally applies to parts of the stdlib
that are in widepread use without actually being a problem in practice,
then the objection is probably invalid.
Remember the Zen:
Now is better than never.
Although never is often better than *right* now.
Even if we do need to deal with rare, exotic or unusual input, we don't
need to deal with them *right now*. When somebody submits an enhancement
request "support non-blocking streams", we can deal with it then.
Probably by rejecting it.
More information about the Python-ideas