To unicode or not to unicode
rridge at csclub.uwaterloo.ca
Sat Feb 21 20:52:09 CET 2009
Thorsten Kampe <thorsten at thorstenkampe.de> wrote:
>> RFC 1036 doesn't require nor give a meaning to a Content-Type header
>> in a Usenet message
>Well, /maybe/ the reason for that is that RFC 1036 was written in 1987
>and the first MIME RFC in 1992...?
>"Son of RFC 1036" mentions MIME more often than you can count.
Since it was never sumbitted and accepted, RFC 1036 remains current.
>> so there's nothing wrong with the original poster's newsreader.
>If you follow RFC 1036 (who was written before anyone even thought of
>MIME) then all content has to ASCII. The OP used non ASCII letters.
RFC 1036 doesn't place any restrictions on the content on the body of
an article. On the other hand "Son of RFC 1036" does have restrictions
on characters used in the body of message:
Articles MUST not contain any octet with value exceeding 127,
i.e. any octet that is not an ASCII character
Which means that merely adding a Content-Encoding header wouldn't
be enough to conform to "Son of RFC 1036", the original poster would
also have had to either switch to a 7-bit character set or use a 7-bit
compatible transfer encoding. If you trying to claim that "Son of RFC
1036" is the new defacto standard, then that would mean your newsreader
is broken too.
>It's all about declaring your charset. In Python as well as in your
>newsreader. If you don't declare your charset it's ASCII for you - in
>Python as well as in your newsreader.
Except in practice unlike Python, many newsreaders don't assume ASCII.
The original article displayed fine for me. Google Groups displays it
I could just as easily argue that assuming ISO 8859-1 is the defacto
standard, and that its your newsreader that's broken. The reality however
is that RFC 1036 is the only standard for Usenet messages, defacto or
otherwise, and so there's nothing wrong with anyone's newsreader.
l/ // Ross Ridge -- The Great HTMU
[oo][oo] rridge at csclub.uwaterloo.ca
More information about the Python-list