[Python-Dev] os.path.normcase rationale?

Steven D'Aprano steve at pearwood.info
Sat Sep 25 05:25:26 CEST 2010


On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:

> I think that, like os.path.realpath(), it should not fail if the file
> does not exist.
>
> Maybe the API could be called os.path.unnormpath(), since it is in a
> sense the opposite of normpath() (which removes case) ? But I would
> want to write it so that even on Unix it scans the filesystem, in
> case the filesystem is case-preserving (like the default fs on OS X).

It is not entirely clear to me what this function is meant to actually 
do? Should it:

1. Return the case of a filename in some canonical form which depends 
   on the file system?
2. Return the case of a filename as it is actually stored on disk?
3. Something else?

and just for completeness:

4. Return the case of a filename in some arbitrarily-chosen canonical 
   form which does not depend on the file system?

These are not the same, either conceptually or in practice.

If you want #4, you already have it in os.path.normcase.

I think that the OP, Chris, wants #1, but it isn't entirely clear to me. 
It's possible that he wants #2.

Various people have posted links to recipes that solve case #2. Note 
though that this necessarily demands that if the file doesn't exist, it 
should raise an exception.

In the case of #1, if the file system doesn't exist, we can't predict 
what the canonical form should be.

The very concept of canonical form for file names is troublesome. If the 
file system is case-preserving, the file system doesn't define a 
canonical form: the case of the file name will depend on how the file 
is initially named. If the file system is case-destructive the 
behaviour will depend on the file system itself: e.g. FAT12 and ISO 
9660 both uppercase file names, but other file systems may make other 
choices. For some arbitrary path, where we don't know what file system 
it is, or if the path doesn't actually exist, we have no way of telling 
what the file system's canonical form will be, or even whether it will 
have one.

Note that I've been talking about case preservation, not case 
sensitivity. That's because case preservation is orthogonal to 
sensitivity. You can see three of the four combinations, e.g.:

Preserving + insensitive:  fat32, NTFS under Win32, normally HFS+
Preserving + sensitive:  ext3, NTFS under POSIX, optionally HFS+
Destructive + insensitive:  fat12, fat16 without long file name support

To the best of my knowledge, destructive + sensitive doesn't exist. It 
could, in principle, but it would be silly to do so.

Note that just knowing the file system type is not enough to tell what 
its behaviour will be. Given an arbitrary file system, there's no 
obvious way to determine what it will do to file names short of trying 
to create a file and see what happens.



-- 
Steven D'Aprano


More information about the Python-Dev mailing list