To unicode or not to unicode

Thorsten Kampe thorsten at thorstenkampe.de
Sat Feb 21 22:05:39 CET 2009


* Ross Ridge (Sat, 21 Feb 2009 14:52:09 -0500)
> Thorsten Kampe  <thorsten at thorstenkampe.de> wrote:
>> It's all about declaring your charset. In Python as well as in your
>> newsreader. If you don't declare your charset it's ASCII for you - in
>> Python as well as in your newsreader.
> 
> Except in practice unlike Python, many newsreaders don't assume ASCII.

They assume ASCII - unless you declare your charset (the exception being 
Outlook Express and a few Windows newsreaders). Everything else is 
"guessing".

> The original article displayed fine for me. Google Groups displays it
> correctly too:
> 
> 	http://groups.google.com/group/comp.lang.python/msg/828fefd7040238bc

Your understanding of the principles of Unicode is as least as non-
existant as the OP's.
 
> I could just as easily argue that assuming ISO 8859-1 is the defacto
> standard, and that its your newsreader that's broken.

There is no "standard" in regard to guessing (this is what you call 
"assuming"). The need for explicit declaration of an encoding is exactly 
the same in Python as in any Usenet article.

> The reality however is that RFC 1036 is the only standard for Usenet
> messages, defacto or otherwise, and so there's nothing wrong with
> anyone's newsreader.

The reality is that all non-broken newsreaders use MIME headers to 
declare and interpret the charset being used. I suggest you read at 
least http://www.joelonsoftware.com/articles/Unicode.html to get an idea 
of Unicode and associated topics.

Thorsten



More information about the Python-list mailing list