[issue3187] os.listdir can return byte strings
report at bugs.python.org
Tue Aug 26 20:15:21 CEST 2008
Dwayne Litzenberger <dlitz at dlitz.net> added the comment:
I think Guido already understands this, but I haven't seen it stated
very clearly here:
** Different systems use different "things" to identify files. **
On Linux/ext3, all filenames are *octet strings* (i.e. bytes), and
*only* the following caveats apply:
- a filename/pathname cannot contain the zero-octet (b"\x00").
- a filename/pathname cannot be empty.
- a filename cannot contain the slash (b"/"); In a pathname, the slash
is used to separate filenames.
- the filenames b"." and b".." have special meanings; They cannot be
created, deleted, or renamed.
All filenames that meet these criteria are valid, and calling them
"invalid" amounts to plugging one's ears and shouting "LA LA LA" while
imagining Unicode having pre-dated Unix.
It is sometimes convenient to imagine filenames on Linux/ext3 as
sequences of Unicode code points (where the encoding is specified by
LC_CTYPE---it's not necessarily UTF-8), but other times (e.g. in backup
tools that need to be robust in the face of mischievous users) it is an
unnecessary abstraction that introduces bugs.
On Windows/NTFS, the situation is entirely different: Filenames are
actually sequences of Unicode code points, and if you pretend they are
octet strings, Windows will happily invent phantom filenames for you
that will show up in the output of os.listdir(), but that will return
"File not found" if you try to open them for reading (if you open them
for writing, you risk clobbering other files that happens to have the
To avoid bugs, it should be possible to work exclusively with filenames
in the platform's native representation. It was possible in Python 2
(though you had to be very careful). Ideally, Python 3 would recognize
and enforce the difference instead of trying to guess the translations;
"Explicit is better than implicit" and all that.
Python tracker <report at bugs.python.org>
More information about the Python-bugs-list