[Python-Dev] os.path.normcase rationale?

Steven D'Aprano steve at pearwood.info
Tue Oct 5 13:04:39 CEST 2010


On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:
> On 25/09/2010 04:25, Steven D'Aprano wrote:
> > 1. Return the case of a filename in some canonical form which
> > depends on the file system?
> > 2. Return the case of a filename as it is actually stored on disk?
>
> How do 1 and 2 differ?

Case #1 imposes a particular canonical form, regardless of what is 
actually stored on disk. It is similar to normpath, except that we 
could have different canonical forms depending on what the file system 
was. normpath merely generalises from the operating system, and never 
looks at the file system.

Some file systems are case-preserving, and don't have a canonical form. 
We might choose to arbitrarily impose one, as normcase already does. 
Some are case-folding, in which case it might be sensible to choose the 
same canonical form as the file system actually uses. However, this may 
be implementation dependent e.g. under FAT12 or FAT16, the file system 
will take a file name like pArRoT.tXt and fold it to PARROT.TXT, or 
possibly parrot.txt, or Parrot.txt. Even if that's not the case for 
FAT12, it may be the case for other case-folding file systems. And the 
behaviour of FAT16 will differ according to whether or not it has been 
built with support for long file names.


Case #2 says to actually look at the file and see what the file system 
considers it's name to be. Consider a NTFS file system. By default it 
is case-preserving and case-insensitive, although that can be changed. 
(Just because a file system is NTFS doesn't mean that will be 
case-insensitive. NTFS can also run in a POSIX mode which is 
case-sensitive. But I digress.)

For simplicity, suppose you're on Windows using NTFS with the standard 
non-POSIX behaviour. You create a file named pArRoT.tXt. This will be 
stored on disk using the exact characters that you typed. The file 
system does no case-folding and merely uses whatever characters are fed 
to it, which in the case of Windows apps is likely to be whatever 
characters the user types. In this case, we don't try to impose a 
particular case on file names, but return whatever actually exists on 
disk.


> FWIW, the use case that setuptools has (and 
> for which it currently incorrectly uses normpath) is number 2.
>
> > 4. Return the case of a filename in some arbitrarily-chosen
> > canonical form which does not depend on the file system?
>
> This is what normpath does, but only if you're on Windows ;-)

Not quite. macpath.normcase() also lowercases the path. So does the 
module for OS/2.

In any case, Windows is not a file system. It is quite possible to have 
virtually any combination of case-destroying, case-preserving, 
-sensitive and -insensitive file systems on the one Windows system. Say, 
a FAT12 floppy, an NTFS partition, and an ext2 USB stick. Windows 
doesn't ship with native support for ext2, but that doesn't mean it 
can't be installed with third party drivers.

normpath pays no attention to any of this, and just lowercases the path. 
At least that's cheap, and consistent, even if it solves the wrong 
problem :)



-- 
Steven D'Aprano


More information about the Python-Dev mailing list