Unicode issue on Windows cmd line

jeffg jeffgemail at gmail.com
Wed Feb 11 23:16:09 EST 2009


On Feb 11, 10:00 pm, "Gabriel Genellina" <gagsl-... at yahoo.com.ar>
wrote:
> En Wed, 11 Feb 2009 23:11:37 -0200, jeffg <jeffgem... at gmail.com> escribió:
>
>
>
> > On Feb 11, 6:30 pm, "Martin v. Löwis" <ma... at v.loewis.de> wrote:
> >> > Thanks, I ended up using encode('iso-8859-15', "replace")
> >> > Perhaps more up to date than cp1252...??
> >> If you encode as iso-8859-15, but this is not what your terminal
> >> expects, it certainly won't print correctly. To get correct printing,
> >> the output encoding must be the same as the terminal encoding. If the
> >> terminal encoding is not up to date (as you consider cp1252), then
> >> the output encoding should not be up to date, either.
> > I did try UTF-8 but it produced the upper case character instead of
> > the proper lower case, so the output was incorrect for the unicode
> > supplied.
> > I think both 8859-15 and cp1252 produced the correct output, but I
> > figured 8859-15 would have additional character support (though not
> > sure this is the case - if it is not, please let me know and I'll use
> > 1252).  I'm dealing with large data sets and this just happend to be
> > one small example.  I want to have the best ability to write future
> > unicode characters properly based on running from the windows command
> > line (unless there is a better way to do it on windows).
>
> As Martin v. Löwis already said, the encoding used by Python when writing  
> to the console, must match the encoding the console expects. (And you also  
> should use a font capable of displaying such characters).
>
> windows-1252 and iso-8859-15 are similar, but not identical. This table  
> shows the differences (less than 30 printable characters):  http://en.wikipedia.org/wiki/Western_Latin_character_sets_(computing)
> If your script encodes its output using iso-8859-15, the corresponding  
> console code page should be 28605.
> "Western European" (whatever that means exactly) Windows versions use the  
> windows-1252 encoding as the "Ansi code page" (GUI applications), and  
> cp850 as the "OEM code page" (console applications) -- cp437 in the US  
> only.
>
> C:\Documents and Settings\Gabriel>chcp 1252
> Tabla de códigos activa: 1252
>
> C:\Documents and Settings\Gabriel>python
> Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]  
> on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
> py> unichr(0x0153).encode("windows-1252")
> '\x9c'
> py> print _
> œ
> py> ^Z
>
> C:\Documents and Settings\Gabriel>chcp 28605
> Tabla de códigos activa: 28605
>
> C:\Documents and Settings\Gabriel>python
> Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)]  
> on win
> 32
> Type "help", "copyright", "credits" or "license" for more information.
> py> unichr(0x0153).encode("iso-8859-15")
> '\xbd'
> py> print _
> œ
> py> unichr(0x0153).encode("latin9")
> '\xbd'
>
> --
> Gabriel Genellina

Thanks, switched it to windows-1252.



More information about the Python-list mailing list