On 9/9/06, <b class="gmail_sendername">Marcin 'Qrczak' Kowalczyk</b> <<a href="mailto:firstname.lastname@example.org">email@example.com</a>> wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">
> Note that there are plenty of other characters that should be<br>> treated as ignorable, so the applications that are broken for BOMs<br>> are broken more generally.<br><br>I disagree. UTF-8 BOM should not be used on Unix. It's not a reliable
<br>method of encoding detection in general (applies only to Unicode),<br>and it breaks the simplicity of text streams.</blockquote><div> <br>We're offtopic but: treating these decisions as operating-system-specific is a big part of what caused the current mess.
e.g with Japanese Windows users and Japanese Unix users using different encodings. The Unicode consortium should address the issue of auto-encoding and make a recommendation for how "raw" text files can have their encoding detected. A combination of BOM, coding declaration and fall-back to UTF-8 would cover the vast majority of the world's languages and incorporate many national encodings.
<br><br>Are you defending the status quo wherein text data cannot even be reliably processed on the desktop on which it was created (yes, even on Unix: look back in this thread). Do you have a positive prescription?<br><br>