On 9/9/06, <b class="gmail_sendername">Marcin 'Qrczak' Kowalczyk</b> &lt;<a href="mailto:qrczak@knm.org.pl">qrczak@knm.org.pl</a>&gt; wrote:<div><span class="gmail_quote"></span><blockquote class="gmail_quote" style="border-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">

&gt; Note that there are plenty of other characters that should be<br>&gt; treated as ignorable, so the applications that are broken for BOMs<br>&gt; are broken more generally.<br><br>I disagree. UTF-8 BOM should not be used on Unix. It's not a reliable

<br>method of encoding detection in general (applies only to Unicode),<br>and it breaks the simplicity of text streams.</blockquote><div>&nbsp;<br>We're offtopic but: treating these decisions as operating-system-specific is a big part of what caused the current mess. 

e.g with Japanese Windows users and Japanese Unix users using different encodings. The Unicode consortium should address the issue of auto-encoding and make a recommendation for how &quot;raw&quot; text files can have their encoding detected. A combination of BOM, coding declaration and fall-back to UTF-8 would cover the vast majority of the world's languages and incorporate many national encodings.

<br><br>Are you defending the status quo wherein text data cannot even be reliably processed on the desktop on which it was created (yes, even on Unix: look back in this thread). Do you have a positive prescription?<br><br>

&nbsp;Paul Prescod<br><br></div></div>