<html>

  <head>

    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">

  </head>

  <body bgcolor="#FFFFFF" text="#330033">

    <div class="moz-cite-prefix">On 7/16/2014 7:27 AM, Frank Millman

      wrote:<br>

    </div>

    <blockquote cite="mid:lq6255$gi5$1@ger.gmane.org" type="cite">

      <pre wrap="">I just tried an experiment in my own project. Ned Batchelder, in his 

Pragmatic Unicode presentation, <a class="moz-txt-link-freetext" href="http://nedbatchelder.com/text/unipain.html">http://nedbatchelder.com/text/unipain.html</a>, 

suggests that you always have some unicode characters in your data, just to 

ensure that they are handled correctly. He has a tongue-in-cheek example 

which spells the word PYTHON using various exotic unicode characters. I used 

this to populate a field in my database, to see if it would display in my 

browser-based client.

The hardest part was getting it in. There are 6 characters, but utf-8 

requires 16 bytes to store it -

    b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4'.decode('utf-8')

However, that was it. Without any changes to my program, it read it from the 

database and displayed it on the screen. IE8 could only display 2 out of the 

6 characters correctly, and Chrome could display 5 out of 6, but that is a 

separate issue. Python3 handled it perfectly.

</pre>

    </blockquote>

    <br>

    wrapping the above in a print(), on Windows, I get:<br>

    <br>

    Traceback (most recent call last):<br>

      File "D:\my\py\python-utf8.py", line 1, in <module><br>

print(b'\xe2\x84\x99\xc6\xb4\xe2\x98\x82\xe2\x84\x8c\xc3\xb8\xe1\xbc\xa4'.decode('utf-8'))<br>

      File "C:\Python33\lib\encodings\cp437.py", line 19, in encode<br>

        return codecs.charmap_encode(input,self.errors,encoding_map)[0]<br>

    UnicodeEncodeError: 'charmap' codec can't encode characters in

    position 0-5: character maps to <undefined><br>

    <br>

    So Python3 doesn't handle it perfectly on Windows.  And I saw

    someone blame the Windows console for that... but the Windows

    console can properly display all those characters if the proper APIs

    are used. The bug is 7 years old: <a class="moz-txt-link-freetext" href="http://bugs.python.org/issue1602">http://bugs.python.org/issue1602</a>

    and hasn't been fixed, although the technology for fixing it is

    available, and various workarounds (with limitations) have been

    available for 5 years, and patches have been available for 3 years

    that work pretty good. However, just a few days ago, 26 July 2014,

    Drekin had an insight that may possibly lead to a patch that will

    work well enough to be integrated into some future version of

    Python... I hope he follows up on it. This is a serious limitation,

    and it is, and always has been, a bug in Python 3 Unicode handling

    on Windows.<br>

  </body>

</html>