BOM should be ignored by Python

Neil Hodgson neilh at scintilla.org
Tue May 2 01:40:40 CEST 2000


   Unicode files may contain an initial Byte Order Mark to describe the way
that the file is encoded. In UTF-8 this is the byte sequence EF BB BF. One
current editor, the Win2K version of Notepad adds this BOM to the front of
files saved as UTF-8. I would like to see the Python interpreter accept but
ignore this at the start of a file. The current behaviour is to throw a
SyntaxError.

   The BOM can then be used by editing environments such as Pythonwin to
mark files that are stored in UTF-8 and choose appropriate display behaviour
such as displaying files with this mark as Unicode and files without this
mark as a sequence of 8 bit characters in a preferred locale.

   In the future, the BOM could also be used to change the behaviour of the
interpreter.

   Neil





More information about the Python-list mailing list