[Python-Dev] Quick sum up about open() + BOM

Victor Stinner victor.stinner at haypocalc.com
Sat Jan 9 14:50:28 CET 2010


Le samedi 09 janvier 2010 13:45:58, vous avez écrit :
> > Note: I implemented the BOM check in TextIOWrapper; so it's already
> > usable for any file-like object.
> Yes, but the implementation is limited to just BOM checking
> and thus only supports UTF-8-SIG, UTF-16 and UTF-32.

Sure, but that's already better than no BOM check :-) It looks like many 
people would apprecite UTF-8-SIG detection, since this encoding is common on 

> BTW: I haven't looked at your implementation, but what happens
> when your BOM check fails ? Will the implementation add the
> already read bytes back to a buffer ?

My implementation is done between buffer.read() and decoder.decode(data). If 
there is a BOM: set the encoding and remove the BOM bytes from the byte 
string. Otherwise, use another algorithm to choose the encoding and leave the 
byte string unchanged.

It can be seen as a codec: it works like UTF-16 and UTF-32 codecs ;-)

> AFAIK, we currently have a moratorium on changes to Python
> builtins. How does that match up with the proposed changes ?

Oh yes, I forgot the moratorium. In all solutions, some of them don't change 
the API. Eg. Antoine proposed to leave the API unchanged: open(file) => 
open(file) :-) I don't know if it's compatible with the moratorium or not.

Victor Stinner

More information about the Python-Dev mailing list