Help needed: are you running windows with a non-ascii interface?

Someone on stackoverflow asked why PyPy cannot run pandas. Here is the error, reformatted from the garbled original: https://gist.github.com/mattip/374e8ba49e2dd2e2b0d5c46a5cd612ed While there was an off-by-one error with the conversion when building time.tzname from the OS c call, I think the issue might be deeper, and related to this still-open cpython bug https://bugs.python.org/issue16322 where non-ascii tznames are a mess to decode. Could someone with a non-ascii (russian, chinese, czech, french, ...) interface in windows - check what pypy3 returns for time.tzname? There is no code to decode it, so it is probably a sting of bytes. What encoding is it in? - try to reproduce the pandas fail on pypy3 (you will need a compiler and a fair amount of time) - see if 6569684f0955 (available after tonight's build on downloads) fixes pandas? Thanks, Matti

Hi Matti, On Wed, 26 Feb 2020 at 11:59, Matti Picus <matti.picus@gmail.com> wrote:
- check what pypy3 returns for time.tzname? There is no code to decode it, so it is probably a sting of bytes. What encoding is it in?
On a french Windows I get, in CPython 3.6, a tuple of two unicodes that seem correct; and on PyPy3 I get instead a tuple of two unicodes that are very incorrect. CPython3.6 (first line) versus PyPy3.6: ('Europe de l\x92Ouest', 'Europe de l\x92Ouest (heure d\x92été)') ('Europe de l\Ufffff44fuest (heure d\Ufffff4e9t\u79c0', 'Europe de l\Ufffff44fuest (heure d\Ufffff4e9t\x39') In particular the first escaped character \Ufffff44f really should be two characters, '\x92O', and there is similar mangling later. Also the first of the two unicodes is much shorter on CPython3. Finally, the very last character is rendered as '\x39' but I have no clue why it is even rendered as '\x39' instead of just the ascii '9'. So yes, we have more than one bug here. Armin

Hi again, On Wed, 26 Feb 2020 at 14:28, Armin Rigo <armin.rigo@gmail.com> wrote:
In particular the first escaped character \Ufffff44f really should be two characters, '\x92O', and there is similar mangling later. Also the first of the two unicodes is much shorter on CPython3. Finally, the very last character is rendered as '\x39' but I have no clue why it is even rendered as '\x39' instead of just the ascii '9'.
So yes, we have more than one bug here.
Uh, in fact time.tzname[0] == time.tzname[1] and if I print separately time.tzname[0] and time.tzname[1] I get twice the same thing: 'Europe de l\Ufffff44fuest (heure d\Ufffff4e9t\u79c0' But if I print the tuple, or manually (time.tzname[0], time.tzname[1]) or even (time.tzname[0], time.tzname[0]), then I see the result above where the second item is repr'ed differently from the first one. Armin
participants (2)
-
Armin Rigo
-
Matti Picus