[Python-Dev] [Python-3000] New proposition for Python3 bytes filename issue
Victor Stinner
victor.stinner at haypocalc.com
Tue Sep 30 01:29:24 CEST 2008
Le Monday 29 September 2008 19:06:01 Guido van Rossum, vous avez écrit :
> >> - listdir(unicode) -> unicode and raise an error on invalid filename
>
> I know I keep flipflopping on this one, but the more I think about it
> the more I believe it is better to drop those names than to raise an
> exception. Otherwise a "naive" program that happens to use
> os.listdir() can be rendered completely useless by a single non-UTF-8
> filename. Consider the use of os.listdir() by the glob module. If I am
> globbing for *.py, why should the presence of a file named b'\xff'
> cause it to fail?
It would be hard for a newbie programmer to understand why he's unable to find
his very important file ("important r?port.doc") using os.listdir(). And yes,
if your file system is broken, glob(<unicode>) will fail.
If we choose to support bytes on Linux, a robust and portable program have to
use only bytes filenames on Linux to always be able to list and open files.
A full example to list files and display filenames:
import os
import os.path
import sys
if os.path.supports_unicode_filenames:
cwd = getcwd()
else:
cwd = getcwdb()
encoding = sys.getfilesystemencoding()
for filename in os.listdir(cwd):
if os.path.supports_unicode_filenames:
text = str(filename, encoding, "replace)
else:
text = filename
print("=== File {0} ===".format(text))
for line in open(filename):
...
We need an "if" to choose the directory. The second "if" is only needed to
display the filename. Using bytes, it would be possible to write better code
detect the real charset (eg. ISO-8859-1 in a UTF-8 file system) and so
display correctly the filename and/or propose to rename the file. Would it
possible using UTF-8b / PUA hacks?
--
Victor Stinner aka haypo
http://www.haypocalc.com/blog/
More information about the Python-Dev
mailing list