[Python-ideas] struct.unpack should support open files

Steven D'Aprano steve at pearwood.info
Wed Dec 26 20:59:39 EST 2018

On Thu, Dec 27, 2018 at 10:02:09AM +1100, Cameron Simpson wrote:

> >Also I'm thinking about type annotations in typeshed.
> >Now the type is Union[array[int], bytes, bytearray, memoryview]
> >Should it be Union[io.BinaryIO, array[int], bytes, bytearray, 
> >memoryview] ?
> And this is why I, personally, think augumenting struct.unpack and 
> json.read and a myriad of other arbitrary methods to accept both 
> file-like things and bytes is an open ended can of worms.

I presume you mean json.load(), not read, except that it already reads 
from files.
Nobody is talking about augmenting "a myriad of other arbitrary methods" 
except for you. We're talking about enhancing *one* function to be a 
simple generic function.

I assume you have no objection to the existence of json.load() and 
json.loads() functions. (If you do think they're a bad idea, I don't 
know what to say.) Have they lead to "an open ended can of worms"?

If we wrote a simple wrapper:

def load(obj, *args, **kwargs):
    if isinstance(obj, str):
        return json.loads(obj, *args, **kwargs)
        return json.load(obj, *args, **kwargs)

would that lead to "an open ended can of worms"?

These aren't rhetoricial questions. I'd like to understand your 
objection. You have dismissed what seems to be a simple enhancement with 
a vague statement about hypothetical problems. Please explain in 
concrete terms what these figurative worms are.

Let's come back to unpack. Would you object to having two separate 
functions that matched (apart from the difference in name) the API used 
by json, pickle, marshal etc?

- unpack() reads from files
- unpacks() reads from strings

Obviously this breaks backwards compatibility, but if we were designing 
struct from scratch today, would this API open a can of worms?

(Again, this is not a rhetorical question.)

Let's save backwards compatibility:

- unpack() reads from strings
- unpackf() reads from files

Does this open a can of worms?

Or we could use a generic function. There is plenty of precedent for 
generic files in the stdlib. For example, zipfile accepts either 
a file name, or an open file object.

def unpack(fmt, frm):
    if hasattr(frm, "read"):
         return _unpack_file(fmt, frm)
         return _unpack_bytes(fmt, frm)

Does that generic function wrapper create "an open ended can of worms"? 
If so, in what way?

I'm trying to understand where the problem lies, between the existing 
APIs used by json etc (presumably they are fine) and the objections to 
using what seems to be a very similar API for unpack, offerring the same 
functionality but differing only in spelling (a single generic function 
instead of two similarly-named functions).

> And it is why I wrote myself my CornuCopyBuffer class (see my other post 
> in this thread).
> The return from .take is typically a 
> memoryview from `bfr`'s internal buffer - it is _always_ exactly `size` 
> bytes long if you don't pass short_ok=True, or it raises an exception. 

That's exactly the proposed semantics for unpack, except there's no 
"short_ok" parameter. If the read is short, you get an exception.

> And so on.
> The point here is: make a class to get what you actually need

Do you know better than the OP (Drew Warwick) and James Edwards what 
they "actually need"?

How would you react if I told you that your CornuCopyBuffer class, is an 
over-engineered, over-complicated, over-complex class that you don't 
need? You'd probably be pretty pissed off at my arrogance in telling you 
what you do or don't need for your own use-cases. (Especially since I 
don't know your use-cases.)

Now consider that you are telling Drew and James that they don't know 
their own use-cases, despite the fact that they've been working 
successfully with this simple enhancement for years.

I'm happy for you that CornuCopyBuffer solves real problems for you, 
and if you want to propose it for the stdlib I'd be really interested 
to learn more about it.

But this is actually irrelevant to the current proposal. Even if we had 
a CornuCopyBuffer in the std lib, how does that help? We will still need 
to call struct.calcsize(format) by hand, still need to call read(size) 
by hand. Your CornuCopyBuffer does nothing to avoid that.

The point of this proposal is to avoid that tedious make-work, not 
increase it by having to wrap our simple disk files in a CornuCopyBuffer 
before doing precisely the same make-work we didn't want to do in the 
first case.

Drew has asked for a better hammer, and you're telling him he really 
wants a space shuttle.


More information about the Python-ideas mailing list