[issue20485] Enable 'import <Non-ASCII>.pyd'

Suzumizaki report at bugs.python.org
Wed Feb 5 09:52:28 CET 2014


Suzumizaki added the comment:

Thank you Victor about msg210125, I read the discussion on ML, May 2011.

Inside the articles, the previous discussion on tracker is found:
"On Windows, don't encode filenames in the import machinery"
http://bugs.python.org/issue11619

Here is my memo, might be helpful to review the discussions.

-- About Window CE --
* Windows CE series have GetProcAddress() at First.
* with Windows CE 3.0, GetProcAddressA() is added.
* but Python community chose 'A' version to support Windows CE.
* Windows CE continues as Windows Embedded Compact today.
* but Python3 for Windows CE seems not to be distributed.

-- About Windows Desktop and Servers --
* Windows Desktops and Servers have GetProcAddress() only, neither A nor W postfix appended.
* GetProcAddress() on Windows Desktop and Servers takes LPCSTR as the 2nd parameter.
* but the parameter, in this case, is null-terminated binary block. neither MBCS nor UTF-8. 
* Visual C++ 2010 encodes non-ASCII export symbols as UTF-8.
* Because the 2 reasons described above the 2 lines, We can give UTF-8 encoded string to GetProcAddress().

I checked the last fact with my Window Japanese Editions:
* XP Home Edition (32bit)
* Vista Home Premium (64bit)
* Windows 8.1 Pro (64bit)

GetProcAddress (Windows CE)
The type of the 2nd parameter is LPC"W"STR, and the document says LPCSTR version added on CE 3.0.
http://msdn.microsoft.com/en-us/library/ms885634.aspx

GetProcAddress (Windows Desktop/Server)
The type of the 2nd parameter is LPCSTR, nor LPC"T"STR neither LPC"W"STR.
Note that the example seems to be wrong about using TEXT macro.
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683212(v=vs.85).aspx

PythonCE (seems stopped at Python 2.5 compatible)
http://pythonce.sourceforge.net/

Symbols seem to be encoded utf-8 inside Windows Executable 
https://mail.python.org/pipermail/python-dev/2011-May/111325.html

-- About C/C++ Standards --
* C99 says the significant length of identifiers are 63.
* C99 allows to use Unicode to name identifiers.
* but not define how to translate \uNNNN or \uNNNNNNNN forms used in "quotations".
* C++11 defines u8"" literals. we can make utf-8 char* string inside u8"quotes" with \u formats.
* but the encoding of source file is platform dependent.
* also, how to export symbols is platform dependent.

-- About C/C++ tool kits --
* Window Executable can contain 2048 chars per each exported symbol.
* Visual C++ 2010 seems to encode exporting symbols with UTF-8.
* gcc don't have logical limit of the length of identifiers.
* Currently, Visual C++ 2010 and LLVM/Clang supports using UTF-8 in whole source code.
* gcc only support \uNNNN or \uNNNNNNNN form.
* About GetProcAddress() functions, see previous memo about Windows.

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue20485>
_______________________________________


More information about the Python-bugs-list mailing list