[I18n-sig] UTF-8 and BOM

Guido van Rossum guido@digicool.com
Sat, 19 May 2001 11:35:18 -0400


> The problem with BOMs is that they are supposed to appear at
> the start of a string.

Taken out of context, this strikes me as nonsense.  Strings in memory
(Python Unicode strings anyway) have absolutely no need for a byte
order mark since they are always in the right (native) byte order.

It is *files* that are supposed to have a BOM at the start.

I think the difference is worth noting: I don't mind if apps that read
files have to deal with the BOM (including, of course, using the
proper byte order to read the rest of the file).  But it is absurd to
expect code dealing with *strings* to handle BOMs.

--Guido van Rossum (home page: http://www.python.org/~guido/)