[Python-Dev] os.path.normcase rationale?
steve at pearwood.info
Sat Sep 25 05:25:26 CEST 2010
On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:
> I think that, like os.path.realpath(), it should not fail if the file
> does not exist.
> Maybe the API could be called os.path.unnormpath(), since it is in a
> sense the opposite of normpath() (which removes case) ? But I would
> want to write it so that even on Unix it scans the filesystem, in
> case the filesystem is case-preserving (like the default fs on OS X).
It is not entirely clear to me what this function is meant to actually
do? Should it:
1. Return the case of a filename in some canonical form which depends
on the file system?
2. Return the case of a filename as it is actually stored on disk?
3. Something else?
and just for completeness:
4. Return the case of a filename in some arbitrarily-chosen canonical
form which does not depend on the file system?
These are not the same, either conceptually or in practice.
If you want #4, you already have it in os.path.normcase.
I think that the OP, Chris, wants #1, but it isn't entirely clear to me.
It's possible that he wants #2.
Various people have posted links to recipes that solve case #2. Note
though that this necessarily demands that if the file doesn't exist, it
should raise an exception.
In the case of #1, if the file system doesn't exist, we can't predict
what the canonical form should be.
The very concept of canonical form for file names is troublesome. If the
file system is case-preserving, the file system doesn't define a
canonical form: the case of the file name will depend on how the file
is initially named. If the file system is case-destructive the
behaviour will depend on the file system itself: e.g. FAT12 and ISO
9660 both uppercase file names, but other file systems may make other
choices. For some arbitrary path, where we don't know what file system
it is, or if the path doesn't actually exist, we have no way of telling
what the file system's canonical form will be, or even whether it will
Note that I've been talking about case preservation, not case
sensitivity. That's because case preservation is orthogonal to
sensitivity. You can see three of the four combinations, e.g.:
Preserving + insensitive: fat32, NTFS under Win32, normally HFS+
Preserving + sensitive: ext3, NTFS under POSIX, optionally HFS+
Destructive + insensitive: fat12, fat16 without long file name support
To the best of my knowledge, destructive + sensitive doesn't exist. It
could, in principle, but it would be silly to do so.
Note that just knowing the file system type is not enough to tell what
its behaviour will be. Given an arbitrary file system, there's no
obvious way to determine what it will do to file names short of trying
to create a file and see what happens.
More information about the Python-Dev