[pypy-issue] [issue1139] Wrong encoding on Windows command line (and source loading)

mgibsonbr tracker at bugs.pypy.org
Sun May 6 18:02:47 CEST 2012


New submission from mgibsonbr <mgibsonbr at gmail.com>:

If an unicode string is entered via command line (pypy-1.8.0 on Windows XP), say:

>>>> u'áÇñ'
u'\xa0\u20ac\xa4'

An incorrect string is created. If the same is typed either in PyPy on Linux or
CPython on all platforms or read from a file using codecs.open and the right
encoding, the result is:

u'\xe1\xc7\xf1'

Which is the correct interpretation of that input. A related issue (might be
already covered by Issue402 I don't know) is that PyPy don't seem to recogize
coding headers:

# -*- coding:utf-8 -*-

So the same problem above manifests if there are unescaped unicode strings in
the source files (or doctests for that matter). For a quick way to replicate
that, try running doctest on this file
[difnet.com.br/opensource/unicode_hack.py] (relevant part in the end or it),
should work fine on CPython or PyPy on Linux, but not on PyPy on Windows.

----------
messages: 4303
nosy: mgibsonbr, pypy-issue
priority: bug
status: unread
title: Wrong encoding on Windows command line (and source loading)

________________________________________
PyPy bug tracker <tracker at bugs.pypy.org>
<https://bugs.pypy.org/issue1139>
________________________________________


More information about the pypy-issue mailing list