[docs] [issue20906] Issues in Unicode HOWTO

Graham Wideman report at bugs.python.org
Fri Mar 21 04:49:52 CET 2014


Graham Wideman added the comment:

At the moment I've run out of time to exert much forward push on this.

By way of temporary summary/suggestion for regrouping: Focus on what this page is intending to deliver. What concepts should readers of this page be able to distinguish and understand when they are finished?

To scope out the needed concepts, I suggest identifying representative unicode-related stumbling blocks (possibly from stackoverflow questions).

Here's an example case: just trying to get trivial "beyond ASCII" functionality to work on Windows (Win7, Python 3.3):

--------------------
s = 'knight \u265E'
print('Hello ' + s)
--------------------

... which fails with:

"UnicodeEncodeError: 'charmap' codec can't encode character '\u265e' in position 13: character maps to undefined". 

A naive attempt to fix this by using s.encode() results in the "+" operation failing.

What paths forward do programmers explore in an effort to have this code (a) not throw an exception, and produce at least some output, and (b) make it produce the correct output?

And why does it work as intended on linux?

The set of concepts identified and explained in this article needs to be sufficient to underpin an understanding of the distinct data types, encodings, decodings, translations, settings etc relevant to this problem, and how to use them to get a desired result.

There are similar problems that occur at other Python-system boundaries, which would further illuminate the set of necessary concepts.

Thanks for all comments.

-- Graham

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20906>
_______________________________________


More information about the docs mailing list