unable to print Unicode characters in Python 3
sjmachin at lexicon.net
Mon Jan 26 23:38:50 CET 2009
On Jan 27, 8:38 am, Jean-Paul Calderone <exar... at divmod.com> wrote:
> On Mon, 26 Jan 2009 13:26:56 -0800 (PST), jefm <jef.mangelsch... at gmail.com> wrote:
> >>As Benjamin Kaplin said, Windows terminals use the old cp1252 character
> >>set, which cannot display the euro sign. You'll either have to run it in
> >> something more modern like the cygwin rxvt terminal, or output some
> >>other way, such as through a GUI.
> >>With the standard console, I get the same. But with IDLE, using the
> >>same Python build but through a different interface
> >>Scream at Microsoft or try to find or encourage a console
> >>replacement that Python could use. In the meanwhile, use IDLE. Not
> >>perfect for Unicode, but better.
> >So, if I understand it correctly, it should work as long as you run
> >your Python code on something that can actually print the Unicode
> >Apparently, the Windows command line can not.
> >I mainly program command line tools to be used by Windows users. So I
> >guess I am screwed.
> >Other than converting my tools to have a graphic interface, is there
> >any other solution, other than give Bill Gates a call and bring his
> >command line up to the 21st century ?
> cp1252 can represent the euro sign (<http://en.wikipedia.org/wiki/Windows-1252>). Apparently the chcp command can be used to change the code page
> active in the console (<http://technet.microsoft.com/en-us/library/bb490874.aspx>). I've never tried this myself, though.
Short answer: it doesn't work.
Test [Windows XP SP3, Python 2.6.1]:
Active code page: 850
Active code page: 1252
Active code page: 1252
Python 2.6.1 (r261:67517, Dec 4 2008, 16:51:00) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys; sys.stdout.encoding; sys.stderr.encoding
# So far, so good
>>> import unicodedata as ucd
>>> for b in range(128, 256):
... c = chr(b)
... u = c.decode('cp1252', 'replace')
... name = ucd.name(u)
... print hex(b), c, repr(u), name
0x80 u'\u20ac' EURO SIGN
0x81 u'\ufffd' REPLACEMENT CHARACTER
0x82 u'\u201a' SINGLE LOW-9 QUOTATION MARK
0xfb û u'\xfb' LATIN SMALL LETTER U WITH CIRCUMFLEX
0xfc ü u'\xfc' LATIN SMALL LETTER U WITH DIAERESIS
0xfd ý u'\xfd' LATIN SMALL LETTER Y WITH ACUTE
Ignore what you are seeing in the second field of each above line; it
could well look OK. However what I see on the console is:
capital C with cedilla
small u with diaeresis (umlaut)
small e with acute
superscript two [yes, out of order]
IOW, the bridge might think it's in cp1252 mode, but nobody told the
engine room, which is still churning out cp850.
More information about the Python-list