[issue5604] imp.find_module() mixes UTF8 and MBCS

Andrew Svetlov report at bugs.python.org
Sun Apr 5 06:59:12 CEST 2009


Andrew Svetlov <andrew.svetlov at gmail.com> added the comment:

Continuing work on problem I figured out:
* on Windows it's impossible to convert filenames to file system 
encoding without and don't miss something.
* Windows can work properly only with unicode (wchar_t) characters.
* all other systems feels itself good using utf-8 (or another filesystem 
encoding).
* it's very errorprone to change 'char*' to 'PyUnicode*'.

To solve this problem I assume: 
* all char* in Python API is utf-8.
* if there are need call to operation system api like fopen - call 
imp_fopen, this function will do need conversions. Inside import.c there 
are 4 calls: fopen, open_exclusive, unlink, stat. I want to write stubs 
for this calls. 
* also loaders for dynamic modules aka 'C extensions' have to expect 
utf-8 as  pathname parameter, not 'filesystem encoded'.

Patch for windows is applied (STILL NOT CONVERTED TO OTHER OS).
But for Windows it works (regression tests passed).

If this solution is applicable for 3.1 (as I know Cannon works on excellent importlib but this library will replace imp functionality only 
in 3.2) - I can continue switching. Unfortunately I cannot test py3k 
trunk on non-windows machines - but I can 'make all OS calls as 
expected' and wait for buildbot answer.

Please review import_patch_4th_edition.zip and if I ran in wrong way - 
let me know.

----------
nosy: +brett.cannon
Added file: http://bugs.python.org/file13618/import_patch_4th_edition.zip

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue5604>
_______________________________________


More information about the Python-bugs-list mailing list