[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

24 Jan 2021

      On Sat, Jan 23, 2021 at 03:24:12PM +0000, Barry Scott wrote:
...
I think that you are going to create a bug magnet if you attempt to auto
detect the encoding.
First problem I see is that the file may be a pipe and then you will block
until you have enough data to do the auto detect.
Can you use `open('filename')` to read a pipe?

Is blocking a problem in practice? If you try to open a network file, 
that could block too, if there are network issues. And since you're 
likely to follow the open with a read, the read is likely to block. So 
over all I don't think that blocking is an issue.
...
Second problem is that the first N bytes are all in ASCII and only later
do you see Windows code page signature (odd lack of utf-8 signature).
UTF-8 is a strict superset of ASCII, so if the file is actually 
ASCII, there is no harm in using UTF-8.

The bigger issue is if you have N bytes of pure ASCII followed by some 
non-UTF superset, such as one of the ISO-8859-* encodings. So you end up 
detecting what you think is ASCII/UTF-8 but is actually some legacy 
encoding. But if N is large, say 512 bytes, that's unlikely in practice.
...
...
That auto-detection behaviour could be enough to differentiate it from 
the regular open(), thus solving the "but in ten years time it will be 
redundant and will need to be deprecated" objection.
Having said that, I can't say I'm very keen on the name "open_text", but 
I can't think of any other bikeshed colour I prefer.
Given the the functions purpose is to open unicode text use a name that
reflects that it is the encoding that is set not the mode (binary vs. text).
open_unicode maybe?
I guess that depends on whether the auto-detection is intended to 
support non-Unicode legacy encodings or not.
...
If you are teaching open_text then do you also need to have open_binary?
No. There are no frustrating, difficult, platform-specific encoding 
issues when reading binary files. Bytes are bytes.

-- 
Steve

[Python-ideas] Re: Adding `open_text()` builtin function. (relating to PEP 597)

Steven D'Aprano