On 23 Jan 2021, at 11:00, Steven D'Aprano firstname.lastname@example.org wrote:
On Sat, Jan 23, 2021 at 12:40:55AM -0500, Random832 wrote:
On Fri, Jan 22, 2021, at 20:34, Inada Naoki wrote:
- Default encoding is "utf-8".
it might be worthwhile to be a little more sophisticated than this.
Notepad itself uses character set detection [it might not be reasonable to do this on the whole file as notepad does, but maybe the first 512 bytes, or the result of read1(512)?] when opening a file of unknown encoding, and msvcrt's "ccs=UTF-8" option to fopen will at least detect at the presence of UTF-8 and UTF-16 BOMs [and treat the file as UTF-16 in the latter case].
I like Random's idea. If we add a new "open text file" builtin function, we should seriously consider having it attempt to auto-detect the encoding. It need not be as sophisticated as `chardet`.
I think that you are going to create a bug magnet if you attempt to auto detect the encoding.
First problem I see is that the file may be a pipe and then you will block until you have enough data to do the auto detect.
Second problem is that the first N bytes are all in ASCII and only later do you see Windows code page signature (odd lack of utf-8 signature).
That auto-detection behaviour could be enough to differentiate it from the regular open(), thus solving the "but in ten years time it will be redundant and will need to be deprecated" objection.
Having said that, I can't say I'm very keen on the name "open_text", but I can't think of any other bikeshed colour I prefer.
Given the the functions purpose is to open unicode text use a name that reflects that it is the encoding that is set not the mode (binary vs. text).
If you are teaching open_text then do you also need to have open_binary?
-- Steve _______________________________________________ Python-ideas mailing list -- email@example.com To unsubscribe send an email to firstname.lastname@example.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://email@example.com/message/VAWFPI... Code of Conduct: http://python.org/psf/codeofconduct/