[I18n-sig] UTF-8 and BOM
Paul Prescod
paulp@ActiveState.com
Thu, 17 May 2001 09:46:12 -0700
"Martin v. Loewis" wrote:
>
>...
>
> There probably is none, although giving them a .txt extension is a
> good starting point. What is the standard for tagging KOI8-R documents
> on the Windows file system?
There isn't one. But utf-8 is an encoding that is growing in popularity
and KOI8-R is one that is shrinking. The unreliability of "code pages"
is a big part of what Unicode is supposed to fix.
> > So what if there is a BOM in the middle of the data stream. MAL's
> > decoder will just remove it anyhow. :)
>
> Yes, and I think this is a bug.
Nevertheless, I don't see how concatenating two BOM-prefixed UTF-8
streams is any more or less problematic than concatenating two
BOM-prefixed UTF-16 streams.
I'll repeat that I'm not saying that the UTF-8 encoder should add a BOM.
Until this convention is more common, we shouldn't try to be innovative.
But I still think that BOMs on UTF-8 are a good idea.
--
Take a recipe. Leave a recipe.
Python Cookbook! http://www.ActiveState.com/pythoncookbook