[Python-Dev] Quick sum up about open() + BOM

Victor Stinner victor.stinner at haypocalc.com
Sat Jan 9 13:37:06 CET 2010


Le samedi 09 janvier 2010 02:23:07, Martin v. Löwis a écrit :
> While I would support combining BOM detection in the case where a file
> is opened for reading and no encoding is specified, I see two problems:
> a) if a seek operations is performed before having looked at the BOM,
>    no determination would have been made

TextIOWrapper doesn't support seek to an arbitrary byte. It uses "cookie" 
which is an opaque value. Reuse a cookie from another file or an old cookie is 
forbidden (but it doesn't raise an error). This is not specific to the BOM 
checking: the problem already exist for encodings using a BOM (eg. UTF-16).

> b) what encoding should it use on writing?

Don't change anything to writing.

With Antoince choice: open('file.txt', 'w', encoding=None) continue to use the 
actual heuristic (os.device_encoding() or system locale).

With Guido choice, encoding="BOM": it raises an error, because BOM check is 
not supported when writing into a file. How could the BOM be checked when 
creating a new (empty) file!?

-- 
Victor Stinner
http://www.haypocalc.com/



More information about the Python-Dev mailing list