Re: [Python-Dev] Unicode strings as filenames
Recently, "Martin v. Loewis"
This change works for me on Windows 2000 and allows access to all files no matter what the current code page is set to. On Windows 9x (not yet tested), the _wfopen call should fail causing a fallback to fopen. Possibly the OS should be detected instead and _wfopen not attempted on 9x.
Now that you have that change, please try to extend it to posixmodule.c. This is where I gave up. Notice that, with changing Py_FileSystemDefaultEncoding and open() alone, you have worsened the situation: os.stat will now fail on files with non-ASCII names on which it works under the mbcs encoding, because windows won't find the file (correct me if I'm wrong).
Could someone who really understands this issue (Martin?) perhaps
write a test case for this? I think something like creating a file
with some nonascii chars in the name, and verifying that open(),
readdir(), os.stat() and various others work as expected is what would
be needed (but I'm not sure I fully understand it:-).
--
- Jack Jansen
Could someone who really understands this issue (Martin?) perhaps write a test case for this? I think something like creating a file with some nonascii chars in the name, and verifying that open(), readdir(), os.stat() and various others work as expected is what would be needed (but I'm not sure I fully understand it:-).
I'll attach a script below. It contains UTF-8 encoded data, so to prevent transmission errors, it comes base-64 attached. Running it creates a three additional files in the current directory; I recommend to run it in an empty directory. In case you cannot view the source code properly, I attach a screenshot of my editor. Regards, Martin
Martin v. Loewis:
I'll attach a script below. It contains UTF-8 encoded data, so to prevent transmission errors, it comes base-64 attached. Running it creates a three additional files in the current directory; I recommend to run it in an empty directory.
I have added some more cases to your example Martin, in Hebrew, Chinese and Japanese and a combination. The combination is an interesting case as it will not work with mbcs with a particular code page, as no code page (to my knowledge) contains all the characters. This works using my modifications except for the calls to os.rename. Neil
I have added some more cases to your example Martin, in Hebrew, Chinese and Japanese and a combination. The combination is an interesting case as it will not work with mbcs with a particular code page, as no code page (to my knowledge) contains all the characters.
This works using my modifications except for the calls to os.rename.
This looks interesting :) Any chance of putting all this together in a patch at source-forge? Ultimately uni.py should be rolled into test/test_unicode_filename.py, and it is unclear if http://pythoncard.sourceforge.net/posixmodule.c is the latest with Martin's comments - and it appears posix_open may leave 'fd' uninitialized before comparing < 0. Thanks, Mark.
Mark Hammond:
This looks interesting :) Any chance of putting all this together in a patch at source-forge?
Eventually although I'm not yet sure the direction is sound. It does expand the code horribly. Also not sure if I'll have the determination to push this through to completion - there are still plenty of issues to be resolved. For me, just having open work is the most important bit - all the others are far less used.
Ultimately uni.py should be rolled into test/test_unicode_filename.py,
Directory tests added to uni.py.
and it is unclear if http://pythoncard.sourceforge.net/posixmodule.c is the latest with Martin's comments - and it appears posix_open may leave 'fd' uninitialized before comparing < 0.
New version just uploaded fixing that at http://scintilla.sourceforge.net/winunichanges.zip Neil
This looks interesting :) Any chance of putting all this together in a patch at source-forge?
I do hope Neil will create a patch eventually; so far, it seems to be more convenient to him to post snippets. This is fine with me, since this project still has some way to go for completion. Regards, Martin
participants (4)
-
Jack Jansen
-
Mark Hammond
-
Martin v. Loewis
-
Neil Hodgson