Add function for reading a specific number of bytes/characters from a file object that fails noisily
The code I'm currently working on involves parsing binary data. If I ask for, say, 4 bytes, it's because I actually need 4 bytes and if the file doesn't have 4 bytes for me, it's malformed. Because `f.read(4)` can silently return less than 4 bytes and I don't want to have to explicitly double check every read, I'm using a wrapper function. def read_exact(f, size): data = f.read(size) if len(data) < size: raise EOFError(f"expected read of size {size}, got {len(data)}") return data I don't think my scenario of "give me exactly the number of bytes/characters I asked for or fail noisily" is particularly uncommon, so I think that a similar function should be added to the standard library somewhere. I guess as a function in `io`? Admittedly, as my own code demonstrates, implementing it yourself if you need it is trivial, so it may not actually be worth adding.
On Thu, Mar 03, 2022 at 08:27:50AM -0000, Kevin Mills wrote:
The code I'm currently working on involves parsing binary data. If I ask for, say, 4 bytes, it's because I actually need 4 bytes and if the file doesn't have 4 bytes for me, it's malformed. Because `f.read(4)` can silently return less than 4 bytes and I don't want to have to explicitly double check every read, I'm using a wrapper function.
This is not a terrible idea. Other languages, like Pascal, have facility for reading fixed-size chunks from a file, reading integers or floats from a file. But there are some design questions that need to be sorted out. If you're reading from, say, a serial port, and it only has three bytes, should it hang until a fourth byte is available, or raise an exception (thus losing the first three bytes)? -- Steve
On Thu, 3 Mar 2022 at 20:28, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Mar 03, 2022 at 08:27:50AM -0000, Kevin Mills wrote:
The code I'm currently working on involves parsing binary data. If I ask for, say, 4 bytes, it's because I actually need 4 bytes and if the file doesn't have 4 bytes for me, it's malformed. Because `f.read(4)` can silently return less than 4 bytes and I don't want to have to explicitly double check every read, I'm using a wrapper function.
This is not a terrible idea. Other languages, like Pascal, have facility for reading fixed-size chunks from a file, reading integers or floats from a file. But there are some design questions that need to be sorted out.
If you're reading from, say, a serial port, and it only has three bytes, should it hang until a fourth byte is available, or raise an exception (thus losing the first three bytes)?
My preferred colour for this particular bikeshed: Block. This API doesn't make sense for with non-blocking FDs (just use normal read() and accept short returns), so what it does in that situation probably doesn't matter; I'd be inclined to have it return the three bytes, and document that for non-blocking FDs, this is equivalent to read(n) and may not return the full length. I'd be most likely to use this sort of API with pipes, where the normal thing to do is to block until the other process has pushed the next block of data out. ChrisA
Rather than a new function, maybe a flag? read(n, strict=True) (Not sure if a block flag would also be useful, but maybe) -CHB -- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I actually initially was going to suggest a `strict` flag get added, but I figured that would be impractical. I was mostly concerned about classes that mimic file objects, because (obviously) their read methods wouldn't include a `strict` flag and you couldn't pass such objects to functions using the flag.
On Thu, Mar 3, 2022 at 9:58 AM Kevin Mills <kevin.mills226@gmail.com> wrote:
I actually initially was going to suggest a `strict` flag get added, but I figured that would be impractical. I was mostly concerned about classes that mimic file objects, because (obviously) their read methods wouldn't include a `strict` flag and you couldn't pass such objects to functions using the flag.
you couldn't pass them to functions using a new method, either. No matter how you slice it, this would be an extension to the existing file-like object "protocol" I use quotes because I don't think it's a clearly defined protocol. -CHB
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2GYOXX... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD (Chris) Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Fri, 4 Mar 2022 at 12:10, Christopher Barker <pythonchb@gmail.com> wrote:
On Thu, Mar 3, 2022 at 9:58 AM Kevin Mills <kevin.mills226@gmail.com> wrote:
I actually initially was going to suggest a `strict` flag get added, but I figured that would be impractical. I was mostly concerned about classes that mimic file objects, because (obviously) their read methods wouldn't include a `strict` flag and you couldn't pass such objects to functions using the flag.
you couldn't pass them to functions using a new method, either.
No matter how you slice it, this would be an extension to the existing file-like object "protocol"
I use quotes because I don't think it's a clearly defined protocol.
For what it's worth, there IS a way to slice it that isn't an extension. Create a separate function like this: def read_chunk(f, size): data = f.read(size) while size >= len(data): data += f.read(size - len(data)) return data Stick that into os or something, and then it would work on anything with a read method (and this has been deliberately written to not care whether it's working in characters or bytes). I think it's better as a method, but if there's an issue with extending the protocol, this might be an alternative. ChrisA
On Thu, Mar 03, 2022 at 08:09:00AM -0800, Christopher Barker wrote:
Rather than a new function, maybe a flag?
https://martinfowler.com/bliki/FlagArgument.html https://alexkondov.com/should-you-pass-boolean-to-functions/ -- Steve
+0.1?
The code I'm currently working on involves parsing binary data. If I ask for, say, 4 bytes, it's because I actually need 4 bytes and if the file doesn't have 4 bytes for me, it's malformed. Because `f.read(4)` can silently return less than 4 bytes and I don't want to have to explicitly double check every read, I'm using a wrapper function.
Without taking a position on whether this is worth it in the standard library, it should be noted that Rust's stdlib does include a read_exact function working in much the same way (though of course with a Result<T> instead of an exception, and it should be noted that Rust, as a systems language, is far more likely to be used to parse binary formats). -- Lincoln Auster They/Them
participants (5)
-
Chris Angelico
-
Christopher Barker
-
Kevin Mills
-
Lincoln Auster
-
Steven D'Aprano