[Python-Dev] os.path.normcase rationale?
Steven D'Aprano
steve at pearwood.info
Tue Oct 5 13:04:39 CEST 2010
On Tue, 5 Oct 2010 07:21:15 pm Chris Withers wrote:
> On 25/09/2010 04:25, Steven D'Aprano wrote:
> > 1. Return the case of a filename in some canonical form which
> > depends on the file system?
> > 2. Return the case of a filename as it is actually stored on disk?
>
> How do 1 and 2 differ?
Case #1 imposes a particular canonical form, regardless of what is
actually stored on disk. It is similar to normpath, except that we
could have different canonical forms depending on what the file system
was. normpath merely generalises from the operating system, and never
looks at the file system.
Some file systems are case-preserving, and don't have a canonical form.
We might choose to arbitrarily impose one, as normcase already does.
Some are case-folding, in which case it might be sensible to choose the
same canonical form as the file system actually uses. However, this may
be implementation dependent e.g. under FAT12 or FAT16, the file system
will take a file name like pArRoT.tXt and fold it to PARROT.TXT, or
possibly parrot.txt, or Parrot.txt. Even if that's not the case for
FAT12, it may be the case for other case-folding file systems. And the
behaviour of FAT16 will differ according to whether or not it has been
built with support for long file names.
Case #2 says to actually look at the file and see what the file system
considers it's name to be. Consider a NTFS file system. By default it
is case-preserving and case-insensitive, although that can be changed.
(Just because a file system is NTFS doesn't mean that will be
case-insensitive. NTFS can also run in a POSIX mode which is
case-sensitive. But I digress.)
For simplicity, suppose you're on Windows using NTFS with the standard
non-POSIX behaviour. You create a file named pArRoT.tXt. This will be
stored on disk using the exact characters that you typed. The file
system does no case-folding and merely uses whatever characters are fed
to it, which in the case of Windows apps is likely to be whatever
characters the user types. In this case, we don't try to impose a
particular case on file names, but return whatever actually exists on
disk.
> FWIW, the use case that setuptools has (and
> for which it currently incorrectly uses normpath) is number 2.
>
> > 4. Return the case of a filename in some arbitrarily-chosen
> > canonical form which does not depend on the file system?
>
> This is what normpath does, but only if you're on Windows ;-)
Not quite. macpath.normcase() also lowercases the path. So does the
module for OS/2.
In any case, Windows is not a file system. It is quite possible to have
virtually any combination of case-destroying, case-preserving,
-sensitive and -insensitive file systems on the one Windows system. Say,
a FAT12 floppy, an NTFS partition, and an ext2 USB stick. Windows
doesn't ship with native support for ext2, but that doesn't mean it
can't be installed with third party drivers.
normpath pays no attention to any of this, and just lowercases the path.
At least that's cheap, and consistent, even if it solves the wrong
problem :)
--
Steven D'Aprano
More information about the Python-Dev
mailing list