Python and Jython inconsistencies when encoding strings

Fri Sep 6 11:43:45 EDT 2002

Martin v. Löwis <loewis at informatik.hu-berlin.de> wrote in message
j4ofbbclfp.fsf at informatik.hu-berlin.de...
> >>> s
> u"\u0153"
>
> Now, U+0153 is LATIN SMALL LIGATURE OE. It so happens that \x9c (what
> the terminal sends) is U+0153 in CP 1252 (which is the ANSI code page
> on your Windows installation). This might be a bug in Java, which
> assumes that bytes sent by the terminal are in the ANSI code page,
> when they are really in the OEM code page.

no it's more the Jython parser that does that, things can be fixed running
Jython as

jython -Dpython.console.encoding=cp850

on the other hand output seems buggy for:

print s.encode("cp850")

[I have reported that on our SF bug tracker]

 > > Does anybody know what is causing this inconsistency? Is there any way
to
> > avoid it?
>
> Yes. Don't use the console.

sticking to ascii there can avoid some troubles :).

regards