[Python-ideas] Py3 unicode impositions

Terry Reedy tjreedy at udel.edu
Mon Feb 13 04:41:15 CET 2012


On 2/12/2012 7:54 AM, Paul Moore wrote:

> No. I know that a lot of Unix people advocate UTF-8, and I gather it's
> rapidly becoming standard in the Unix world. But I work on Windows,

Unicode and utf-8 is a standard for the world, not Unix. It surpassed 
us-ascii as the most used character encoding for the WWW about 4 years 
ago. https://en.wikipedia.org/wiki/ASCII

XML is unicode based. I think it fair to say that UTF-8 (and UTF-16) are 
preferred encodings, as 'Encodings other than UTF-8 and UTF-16 will not 
necessarily be recognized by every XML parser'
https://en.wikipedia.org/wiki/Xml#Encoding_detection
OpenDocument is one of many xml-based formats.

Any modern database program that intends to store arbitrary text must 
store unicode (or at least the BMP subset).

So any text-oriented Windows program that gets input from the rest of 
the world has to handle unicode and at least the utf-8 encoding thereof. 
My impression is that Windows itself now uses unicode for text storage. 
It is a shame that it still somewhat hides that by using limited subset 
codepage facades.

None of this minimizes the problem of dealing with text in the 
multiplicity of national and language encodings. None that is not the 
fault of unicode, and unicode makes dealing with multiple encodings at 
the same time much easier. It is too bad that unicode was only developed 
in the 1990s instead of the 1960s.

-- 
Terry Jan Reedy




More information about the Python-ideas mailing list