[Python-Dev] casefolding in pathlib (PEP 428)

Guido van Rossum guido at python.org
Fri Apr 12 19:05:32 CEST 2013

On Fri, Apr 12, 2013 at 1:39 AM, Antoine Pitrou <solipsis at pitrou.net> wrote:
> Ok, I've taken a look at the code. Right now lower() is used for two
> purposes:
> 1. comparisons (__eq__ and __ne__)
> 2. globbing and matching
> While (1) could be dropped, for (2) I think we want glob("*.py") to find
> "SETUP.PY" under Windows. Anything else will probably be surprising to
> users of that platform.

Yeah, I suppose so. But there are more crazy details. E.g. IIRC
Windows silently ignores trailing dots in filenames. Do we want
"*.py." to match SETUP.PY then?

>> - On Linux, paths are really bytes; on Windows (at least NTFS), they
>> are really (16-bit) Unicode; on Mac, they are UTF-8 in a specific
>> normal form (except on some external filesystems).
> pathlib is just relying on Python 3's sane handling of unicode paths
> (thanks to PEP 383). Bytes paths are never used internally.

I suppose that just leaves Unicode normalization, discussed later in the thread.

>> - On Windows, short names are still supported, making the number of
>> ways to spell the path for any given file even larger.
> They are still supported but I doubt they are still relied on (long
> filenames appeared in Windows 95!). I think in common situations we can
> ignore their existence. Specialized tools like Mercurial may have to
> know that they exist, in order to manage potential collisions (but
> Mercurial isn't really the target audience for pathlib, and I don't
> think they would be interested in such an abstraction).

Actually, I've heard of code that dynamically falls back on short
names when paths using long names exceed the system limit for path
length (either 256 or 1024 IIRC). But short names pretty much require
consulting the filesystem, so we can probably ignore them.

I guess the bottom line is that, no matter how hard pathlib tries,
apps cannot always rely on the predictions about filename validity or
equivalence made by pathlib -- we'll have to document that it may be
wrong, even though we have the moral obligation to make sure that it
is right as often as possible.

--Guido van Rossum (python.org/~guido)

More information about the Python-Dev mailing list