<html>
<head>
<meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#330033">
<div class="moz-cite-prefix">On 7/16/2014 7:27 AM, Frank Millman
wrote:<br>
</div>
<blockquote cite="mid:lq6255$gi5$1@ger.gmane.org" type="cite">
<pre wrap="">I just tried an experiment in my own project. Ned Batchelder, in his
Pragmatic Unicode presentation, <a class="moz-txt-link-freetext" href="http://nedbatchelder.com/text/unipain.html">http://nedbatchelder.com/text/unipain.html</a>,
suggests that you always have some unicode characters in your data, just to
ensure that they are handled correctly. He has a tongue-in-cheek example
which spells the word PYTHON using various exotic unicode characters. I used
this to populate a field in my database, to see if it would display in my
browser-based client.
The hardest part was getting it in. There are 6 characters, but utf-8
requires 16 bytes to store it -
b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4'.decode('utf-8')
However, that was it. Without any changes to my program, it read it from the
database and displayed it on the screen. IE8 could only display 2 out of the
6 characters correctly, and Chrome could display 5 out of 6, but that is a
separate issue. Python3 handled it perfectly.
</pre>
</blockquote>
<br>
wrapping the above in a print(), on Windows, I get:<br>
<br>
Traceback (most recent call last):<br>
File "D:\my\py\python-utf8.py", line 1, in <module><br>
print(b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4'.decode('utf-8'))<br>
File "C:\Python33\lib\encodings\cp437.py", line 19, in encode<br>
return codecs.charmap_encode(input,self.errors,encoding_map)[0]<br>
UnicodeEncodeError: 'charmap' codec can't encode characters in
position 0-5: character maps to <undefined><br>
<br>
So Python3 doesn't handle it perfectly on Windows. And I saw
someone blame the Windows console for that... but the Windows
console can properly display all those characters if the proper APIs
are used. The bug is 7 years old: <a class="moz-txt-link-freetext" href="http://bugs.python.org/issue1602">http://bugs.python.org/issue1602</a>
and hasn't been fixed, although the technology for fixing it is
available, and various workarounds (with limitations) have been
available for 5 years, and patches have been available for 3 years
that work pretty good. However, just a few days ago, 26 July 2014,
Drekin had an insight that may possibly lead to a patch that will
work well enough to be integrated into some future version of
Python... I hope he follows up on it. This is a serious limitation,
and it is, and always has been, a bug in Python 3 Unicode handling
on Windows.<br>
</body>
</html>