[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Victor Stinner victor.stinner at haypocalc.com
Fri Jan 8 11:27:43 CET 2010


Le vendredi 08 janvier 2010 05:21:04, Guido van Rossum a écrit :
(...)
> (And yes, I know this happens. Doesn't mean we need to auto-guess by
> default; there are lots of issues e.g. what should happen after
> seeking to offset 0?)

I wrote a new version of my patch (version 3):

 * don't change the default behaviour: use open(filename, encoding="BOM") to 
check the BOM is there is any
 * fix for seek(0): always ignore the BOM
 * add an unit test: check that the right encoding is detect, but also the the 
BOM is ignored (especially after a seek(0))

BOM encoding doesn't work for writing into a file, so open(filename, "w", 
encoding="BOM") raises a ValueError.

-- 
Victor Stinner
http://www.haypocalc.com/



More information about the Python-Dev mailing list