To unicode or not to unicode
Thorsten Kampe
thorsten at thorstenkampe.de
Sat Feb 21 16:05:39 EST 2009
* Ross Ridge (Sat, 21 Feb 2009 14:52:09 -0500)
> Thorsten Kampe <thorsten at thorstenkampe.de> wrote:
>> It's all about declaring your charset. In Python as well as in your
>> newsreader. If you don't declare your charset it's ASCII for you - in
>> Python as well as in your newsreader.
>
> Except in practice unlike Python, many newsreaders don't assume ASCII.
They assume ASCII - unless you declare your charset (the exception being
Outlook Express and a few Windows newsreaders). Everything else is
"guessing".
> The original article displayed fine for me. Google Groups displays it
> correctly too:
>
> http://groups.google.com/group/comp.lang.python/msg/828fefd7040238bc
Your understanding of the principles of Unicode is as least as non-
existant as the OP's.
> I could just as easily argue that assuming ISO 8859-1 is the defacto
> standard, and that its your newsreader that's broken.
There is no "standard" in regard to guessing (this is what you call
"assuming"). The need for explicit declaration of an encoding is exactly
the same in Python as in any Usenet article.
> The reality however is that RFC 1036 is the only standard for Usenet
> messages, defacto or otherwise, and so there's nothing wrong with
> anyone's newsreader.
The reality is that all non-broken newsreaders use MIME headers to
declare and interpret the charset being used. I suggest you read at
least http://www.joelonsoftware.com/articles/Unicode.html to get an idea
of Unicode and associated topics.
Thorsten
More information about the Python-list
mailing list