[Python-Dev] Improve open() to support reading file starting with an unicode BOM

Georg Brandl g.brandl at gmx.net
Sat Jan 9 00:10:24 CET 2010


Am 08.01.2010 22:14, schrieb Tres Seaver:

>> FWIW, I'm personally in favor of using the UTF-8 signature. If people
>> consider them crazy talk, that may be because UTF-8 can't possibly have
>> a byte order - hence I call it a signature, not the BOM. As a signature,
>> I don't consider it crazy at all. There is a long tradition of having
>> magic bytes in files (executable files, Postscript, PDF, ... - see
>> /etc/magic). Having a magic byte sequence for plain text to denote the
>> encoding is useful and helps reducing moji-bake. This is the reason it's
>> used on Windows: notepad would normally assume that text is in the ANSI
>> code page, and for compatibility, it can't stop doing that. So the UTF-8
>> signature gives them an exit strategy.
> 
> Agreed.  Having that marker at the start of the file makes interop with
> other tools *much* easier.

Except if only 50% of the other tools support the signature.

Georg

-- 
Thus spake the Lord: Thou shalt indent with four spaces. No more, no less.
Four shall be the number of spaces thou shalt indent, and the number of thy
indenting shall be four. Eight shalt thou not indent, nor either indent thou
two, excepting that thou then proceed to four. Tabs are right out.




More information about the Python-Dev mailing list