[Python-3000] Offtopic: declaring encoding

Sat Sep 9 19:41:51 CEST 2006

On 9/9/06, Marcin 'Qrczak' Kowalczyk <qrczak at knm.org.pl> wrote:
>
> > Note that there are plenty of other characters that should be
> > treated as ignorable, so the applications that are broken for BOMs
> > are broken more generally.
>
> I disagree. UTF-8 BOM should not be used on Unix. It's not a reliable
> method of encoding detection in general (applies only to Unicode),
> and it breaks the simplicity of text streams.

We're offtopic but: treating these decisions as operating-system-specific is
a big part of what caused the current mess. e.g with Japanese Windows users
and Japanese Unix users using different encodings. The Unicode consortium
should address the issue of auto-encoding and make a recommendation for how
"raw" text files can have their encoding detected. A combination of BOM,
coding declaration and fall-back to UTF-8 would cover the vast majority of
the world's languages and incorporate many national encodings.

Are you defending the status quo wherein text data cannot even be reliably
processed on the desktop on which it was created (yes, even on Unix: look
back in this thread). Do you have a positive prescription?

 Paul Prescod
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060909/163a90dc/attachment.html