On Sun, Jan 24, 2021 at 10:43:54PM -0500, Matt Wozniski wrote:
On Sun, Jan 24, 2021 at 9:53 AM 2QdxY4RzWzUUiLuE@potatochowder.com wrote:
On 2021-01-25 at 00:29:41 +1100, Steven D'Aprano steve@pearwood.info wrote:
On Sat, Jan 23, 2021 at 03:24:12PM +0000, Barry Scott wrote:
First problem I see is that the file may be a pipe and then you will
block
until you have enough data to do the auto detect.
Can you use `open('filename')` to read a pipe?
Yes. Named pipes are files, at least on POSIX.
And no. Unnamed pipes are identified by OS-level file descriptors, so you can't open them with open('filename'),
The `open` function takes either a file path as a string, or a file descriptor as an integer. So you can use `open` to read an unnamed pipe or a socket.
Okay, but I was asking about using open with a filename string. In any case, the existence of named pipes answers my question.
[...]
It's possible to do a `f.read(1)` on a file opened in text mode. If the first two bytes of the file are 0xC2 0x99, that's either ™ if the file is UTF-8, or 슙 if the file is UTF-16BE, or 駂 if the file is UTF-16LE.
Or  followed by the SGC control code in Latin-1. Or ™ in Windows-1252, or ¬ô in MacRoman. Etc.
And `f.read(1)` needs to pick one of those and return it immediately. It can't wait for more information. The contract of `read` is "Read from underlying buffer until we have n characters or we hit EOF."
In text mode, reads are always buffered:
https://docs.python.org/3/library/functions.html#open
so `f.read(1)` will read as much as needed, so long as it only returns a single character.
A typical buffer size is 4096 bytes, or more.
In any case, I believe the intention of this proposal is for *open*, not read, to perform the detection.