[Python-Dev] casefolding in pathlib (PEP 428)

Cameron Simpson cs at zip.com.au
Fri Apr 12 01:09:07 CEST 2013


On 11Apr2013 14:11, Guido van Rossum <guido at python.org> wrote:
| Some of my Dropbox colleagues just drew my attention to the occurrence
| of case folding in pathlib.py. Basically, case folding as an approach
| to comparing pathnames is fatally flawed. The issues include:
| 
| - most OSes these days allow the mounting of both case-sensitive and
| case-insensitive filesystems simultaneously
| 
| - the case-folding algorithm on some filesystems is burned into the
| disk when the disk is formatted
| 
| - case folding requires domain knowledge, e.g. turkish dotless I
| 
| - normalization is a mess, even on OSX, where it's better defined than elsewhere

Yes, but what's the use case? Specificly, _why_ are you comparing pathnames?

To my mind case folding is just one mode of filename conflict.
Surely there are others (forbidden characters in some domains, like
colons; names significant only to a certain number of characters;
an so forth).

Thus: what specific problem are you case-folding to address?

On a personal basis I would normally address this kind of thing
with stat(), avoiding any app knowledge about pathname rules: does
this path exist, or are these paths referencing the same file? But
of course that doesn't solve the wider issue with Dropbox, where
the rules differ per platform and where work can take place disparately
on separate hosts.

Imagining Dropbox, I'd guess there's a file tree in the backing store.
What is its policy? Does it allow multiple files differing only by case?
I can imagine that would be bad when the tree is presented on a case
insensitive platform (eg Windows, default MacOSX).

Taking the view that DropBox should avoid that situation (in what
are doubtless several forms), does Dropbox pre-emptively prevent
making files with specific names based on what is already in the
store, or resolve them to the same object (hard link locally, or
simply and less confusingly and more portably, diverting opens to
the existing name like a CI filesystem would)?

What about offline? That suggests that the forbidden modes should
known to the Dropbox app too. Is this the use case for comparing
filenames instead of just doing a stat() to the local filesystem
or to the remote backing store (via a virtual stat, as it were)?

What does Dropbox do if the local app is disabled and a user runs
riot in the Dropbox directory, making conflicting names: allowed
by the local FS but conflicting in the backing store or on other
hosts?

What does Dropbox do if a user makes conflicting files independently
on different hosts, and then syncs?

I just feel you've got a name conflist issue to resolve (and how
that's done is partly just policy), and pathlib which offers some
facilities related to that kind of thing. But a mismatch between
what you actually need to do and what pathlib offers.

Fixing your problem isn't necessarily a bugfix for pathlib.

I think we need to know the wider context.

Cheers,
-- 
Cameron Simpson <cs at zip.com.au>

I had a *bad* day. I had to subvert my principles and kowtow to an idiot.
Television makes these daily sacrifices possible. It deadens the inner core
of my being.    - Martin Donovan, _Trust_


More information about the Python-Dev mailing list