[I18n-sig] UTF-8 and BOM

Martin v. Loewis martin@loewis.home.cs.tu-berlin.de
Thu, 17 May 2001 06:28:56 +0200

> "M.-A. Lemburg" wrote:
> > 
> >...
> > 
> > Note that BYTE ORDER MARK is only a comment for char point
> > '\ufeff'. The real name is: ZERO WIDTH NO-BREAK SPACE. 

No, and yes. "BYTE ORDER MARK" is not in the comment field of the
database, but in the "Unicode 1.0 name" of the database.

> I'm not sure I buy that, but one could argue that a Zero width no-break
> space character is a legitimate character whether you can see it on a
> computer screen or not...but I don't care enough to make that argument.

I do. A reader must not remove the BOM, unless it is clearly meant to
indicate the encoding of a document.