[Python-ideas] Py3 unicode impositions
Terry Reedy
tjreedy at udel.edu
Mon Feb 13 04:41:15 CET 2012
On 2/12/2012 7:54 AM, Paul Moore wrote:
> No. I know that a lot of Unix people advocate UTF-8, and I gather it's
> rapidly becoming standard in the Unix world. But I work on Windows,
Unicode and utf-8 is a standard for the world, not Unix. It surpassed
us-ascii as the most used character encoding for the WWW about 4 years
ago. https://en.wikipedia.org/wiki/ASCII
XML is unicode based. I think it fair to say that UTF-8 (and UTF-16) are
preferred encodings, as 'Encodings other than UTF-8 and UTF-16 will not
necessarily be recognized by every XML parser'
https://en.wikipedia.org/wiki/Xml#Encoding_detection
OpenDocument is one of many xml-based formats.
Any modern database program that intends to store arbitrary text must
store unicode (or at least the BMP subset).
So any text-oriented Windows program that gets input from the rest of
the world has to handle unicode and at least the utf-8 encoding thereof.
My impression is that Windows itself now uses unicode for text storage.
It is a shame that it still somewhat hides that by using limited subset
codepage facades.
None of this minimizes the problem of dealing with text in the
multiplicity of national and language encodings. None that is not the
fault of unicode, and unicode makes dealing with multiple encodings at
the same time much easier. It is too bad that unicode was only developed
in the 1990s instead of the 1960s.
--
Terry Jan Reedy
More information about the Python-ideas
mailing list