[Python-Dev] os.path.normcase rationale?

Guido van Rossum guido at python.org
Sat Sep 25 16:45:38 CEST 2010

On Fri, Sep 24, 2010 at 8:25 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Sat, 25 Sep 2010 09:22:47 am Guido van Rossum wrote:
>> I think that, like os.path.realpath(), it should not fail if the file
>> does not exist.
>> Maybe the API could be called os.path.unnormpath(), since it is in a
>> sense the opposite of normpath() (which removes case) ? But I would
>> want to write it so that even on Unix it scans the filesystem, in
>> case the filesystem is case-preserving (like the default fs on OS X).
> It is not entirely clear to me what this function is meant to actually
> do? Should it:
> 1. Return the case of a filename in some canonical form which depends
>   on the file system?
> 2. Return the case of a filename as it is actually stored on disk?

This one. This is actually useful (on case-preserving filesystems).
There is no doubt in my mind that this is the requested and needed

> 3. Something else?
> and just for completeness:
> 4. Return the case of a filename in some arbitrarily-chosen canonical
>   form which does not depend on the file system?
> These are not the same, either conceptually or in practice.
> If you want #4, you already have it in os.path.normcase.
> I think that the OP, Chris, wants #1, but it isn't entirely clear to me.

I don't think this is where the issue lies.

> It's possible that he wants #2.
> Various people have posted links to recipes that solve case #2. Note
> though that this necessarily demands that if the file doesn't exist, it
> should raise an exception.

No it needn't; realpath() uses the filesystem but leaves non-existing
parts alone. Also some of the path may exist (e.g. a parent

> In the case of #1, if the file system doesn't exist, we can't predict
> what the canonical form should be.
> The very concept of canonical form for file names is troublesome. If the
> file system is case-preserving, the file system doesn't define a
> canonical form: the case of the file name will depend on how the file
> is initially named. If the file system is case-destructive the
> behaviour will depend on the file system itself: e.g. FAT12 and ISO
> 9660 both uppercase file names, but other file systems may make other
> choices. For some arbitrary path, where we don't know what file system
> it is, or if the path doesn't actually exist, we have no way of telling
> what the file system's canonical form will be, or even whether it will
> have one.
> Note that I've been talking about case preservation, not case
> sensitivity. That's because case preservation is orthogonal to
> sensitivity. You can see three of the four combinations, e.g.:
> Preserving + insensitive:  fat32, NTFS under Win32, normally HFS+
> Preserving + sensitive:  ext3, NTFS under POSIX, optionally HFS+
> Destructive + insensitive:  fat12, fat16 without long file name support
> To the best of my knowledge, destructive + sensitive doesn't exist. It
> could, in principle, but it would be silly to do so.
> Note that just knowing the file system type is not enough to tell what
> its behaviour will be. Given an arbitrary file system, there's no
> obvious way to determine what it will do to file names short of trying
> to create a file and see what happens.

This operation should not do any writes.

The solution may well be OS specific. Solutions for Windows and OS X
have already been pointed out. If it can't be done for other Unix
versions, I think returning the input unchanged on those platform is a
fine fallback (as it is for non-existent filenames).

--Guido van Rossum (python.org/~guido)

