Python unicode and Windows cmd.exe
metolone+gmane at gmail.com
Mon Mar 15 01:02:15 CET 2010
"Terry Reedy" <tjreedy at udel.edu> wrote in message
news:hnjkuo$n16$1 at dough.gmane.org...
On 3/14/2010 4:40 PM, Guillermo wrote:
> Adding the byte that some call a 'utf-8 bom' makes the file an invalid
> utf-8 file.
Not true. From http://unicode.org/faq/utf_bom.html:
Q: When a BOM is used, is it only in 16-bit Unicode text?
A: No, a BOM can be used as a signature no matter how the Unicode text is
transformed: UTF-16, UTF-8, UTF-7, etc. The exact bytes comprising the BOM
will be whatever the Unicode character FEFF is converted into by that
transformation format. In that form, the BOM serves to indicate both that it
is a Unicode file, and which of the formats it is in. Examples:
00 00 FE FF UTF-32, big-endian
FF FE 00 00 UTF-32, little-endian
FE FF UTF-16, big-endian
FF FE UTF-16, little-endian
EF BB BF UTF-8
More information about the Python-list