ASCII and Unicode

Steven D'Aprano steve+comp.lang.python at pearwood.info
Sun Dec 8 18:22:34 CET 2013


On Sat, 07 Dec 2013 17:05:34 +0100, giacomo boffi wrote:

> Steven D'Aprano <steve+comp.lang.python at pearwood.info> writes:
> 
>> Ironically, your post was not Unicode.  [...] Your post was sent using
>> a legacy encoding, Windows-1252, also known as CP-1252
> 
> i access rusi's post using a NNTP server, and in his post i see
> 
> Content-Type: text/plain; charset=UTF-8

But *which post* are you looking at?


I have just looked at three posts from him:

Rusi's original post, where he used the ellipsis characters:

  Subject: Re: Managing Google Groups headaches
  Date: Thu, 5 Dec 2013 23:13:54 -0800 (PST)
  Content-Type: text/plain; charset=windows-1252

Then his reply to me:

  Subject: Re: ASCII and Unicode [was Re: Managing Google Groups headaches]
  Date: Fri, 6 Dec 2013 18:33:39 -0800 (PST)
  Content-Type: text/plain; charset=UTF-8

And finally, his reply to you:

  Subject: Re: ASCII and Unicode
  Date: Sun, 8 Dec 2013 08:41:10 -0800 (PST)
  Content-Type: text/plain; charset=ISO-8859-1

It seems to me that whatever client he is using to post (I believe it is 
Google Groups web interface?) varies the encoding depending on what 
characters are included in his post.


> is it possible that what you see is an artifact of the gateway?

I doubt it. Unfortunately the email mailing list archive doesn't display 
all the email headers, but for the record here is his original post as 
seen by the email mailing list:

https://mail.python.org/pipermail/python-list/2013-December/661782.html

If you view source, you'll see that Mailman (the mailing list software) 
sets the webpage encoding to US-ASCII and encodes the ellipses to &#8230, 
which is a perfectly reasonable thing for a web page to do. So we can be 
confident that when Mailman saw Rusi's post, it was able to correctly 
decode the message and see ellipses.

Although I think that (probably) Google Groups is being stupid by varying 
the charset (why not just use UTF-8 always?), at least it is setting the 
charset correctly. 



-- 
Steven



More information about the Python-list mailing list