Unicode issue on Windows cmd line

Gabriel Genellina gagsl-py2 at yahoo.com.ar
Thu Feb 12 04:00:45 CET 2009


En Wed, 11 Feb 2009 23:11:37 -0200, jeffg <jeffgemail at gmail.com> escribió:
> On Feb 11, 6:30 pm, "Martin v. Löwis" <mar.. at v.loewis.de> wrote:

>> > Thanks, I ended up using encode('iso-8859-15', "replace")
>> > Perhaps more up to date than cp1252...??
>> If you encode as iso-8859-15, but this is not what your terminal
>> expects, it certainly won't print correctly. To get correct printing,
>> the output encoding must be the same as the terminal encoding. If the
>> terminal encoding is not up to date (as you consider cp1252), then
>> the output encoding should not be up to date, either.
> I did try UTF-8 but it produced the upper case character instead of
> the proper lower case, so the output was incorrect for the unicode
> supplied.
> I think both 8859-15 and cp1252 produced the correct output, but I
> figured 8859-15 would have additional character support (though not
> sure this is the case - if it is not, please let me know and I'll use
> 1252).  I'm dealing with large data sets and this just happend to be
> one small example.  I want to have the best ability to write future
> unicode characters properly based on running from the windows command
> line (unless there is a better way to do it on windows).

As Martin v. Löwis already said, the encoding used by Python when writing  
to the console, must match the encoding the console expects. (And you also  
should use a font capable of displaying such characters).

windows-1252 and iso-8859-15 are similar, but not identical. This table  
shows the differences (less than 30 printable characters):  
http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
If your script encodes its output using iso-8859-15, the corresponding  
console code page should be 28605.
"Western European" (whatever that means exactly) Windows versions use the  
windows-1252 encoding as the "Ansi code page" (GUI applications), and  
cp850 as the "OEM code page" (console applications) -- cp437 in the US  
only.

C:\Documents and Settings\Gabriel>chcp 1252
Tabla de códigos activa: 1252

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]  
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("windows-1252")
'\x9c'
py> print _
œ
py> ^Z

C:\Documents and Settings\Gabriel>chcp 28605
Tabla de códigos activa: 28605

C:\Documents and Settings\Gabriel>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]  
on win
32
Type "help", "copyright", "credits" or "license" for more information.
py> unichr(0x0153).encode("iso-8859-15")
'\xbd'
py> print _
œ
py> unichr(0x0153).encode("latin9")
'\xbd'

-- 
Gabriel Genellina




More information about the Python-list mailing list