[New-bugs-announce] [issue4474] PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)
Mark Dickinson
report at bugs.python.org
Sun Nov 30 19:54:08 CET 2008
New submission from Mark Dickinson <dickinsm at gmail.com>:
On systems (Linux, OS X) where sizeof(wchar_t) is 4 and wchar_t arrays are
usually encoded as UTF-32, it looks as though PyUnicode_FromWideChar
simply truncates the 32-bit characters to 16-bits, thus giving incorrect
results for characters outside the BMP. I expected it to convert the UTF-
32 encoding to UTF-16.
Note that PyUnicode_FromWideChar is used to process command-line
arguments, so strange things can happen when passing filenames with non-
BMP characters to a Python script.
Here's an OS X 10.5 Terminal session (current directory is the root of the
py3k tree).
dickinsm$ cat test.py
from sys import argv
print("My arguments are: ",argv)
dickinsm$ ./python.exe test.py
My arguments are: ['testŭ.py']
dickinsm$ ./python.exe Lib/tabnanny.py test.py
'testŭ.py': I/O Error: [Errno 2] No such file or directory: 'testŭ.py'
(In case the character after 'test' and before '.py' isn't showing up
correctly, it's chr(65901), 'GREEK ACROPHONIC TROEZENIAN FIVE HUNDRED'.)
----------
components: Interpreter Core
messages: 76651
nosy: marketdickinson
severity: normal
status: open
title: PyUnicode_FromWideChar incorrect for characters outside the BMP (unix only)
type: behavior
versions: Python 3.1
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue4474>
_______________________________________
More information about the New-bugs-announce
mailing list