[New-bugs-announce] [issue32612] pathlib.(Pure)WindowsPaths can compare equal but refer to different files
report at bugs.python.org
Sun Jan 21 16:04:41 EST 2018
New submission from benrg <benrudiak at gmail.com>:
(Pure)WindowsPath uses str.lower to fold paths for comparison and hashing. This doesn't match the case folding of actual Windows file systems. There exist WindowsPath objects that compare and hash equal, but refer to different files. For example, the strings
'\xdf' (sharp S) and '\u1e9e' (capital sharp S)
'\u01c7' (LJ) and '\u01c8' (Lj)
'\u0130' (I with dot) and 'i\u0307' (i followed by combining dot)
'K' and '\u212a' (Kelvin sign)
are equal under str.lower folding but are distinct file names on NTFS volumes on my Windows 7 machine. There are hundreds of other such pairs.
I think this is very bad. The reverse (paths that compare unequal but refer to the same file) is probably unavoidable and is expected by programmers. But paths that compare equal should never be unequal to the OS.
How to fix this:
Unfortunately, there is no correct way to case fold Windows paths. The FAT, NTFS, and exFAT drivers on my machine all have different behavior. (The examples above work on all three, except for 'K' and '\u212a', which are equivalent on FAT volumes.) NTFS stores its case-folding map on each volume in the hidden $UpCase file, so even different NTFS volumes on the same machine can have different behavior. The contents of $UpCase have changed over time as Windows is updated to support new Unicode versions. NTFS and NFS (and possibly WebDAV) also support full case sensitivity when used with Interix/SUA and Cygwin, though this requires disabling system-wide case insensitivity via the registry.
I think that pathlib should either give up on case folding entirely, or should fold very conservatively, treating WCHARs as equivalent only if they're equivalent on all standard file systems on all supported Windows versions.
If pathlib folds case at all, there should be a solution for people who need to interoperate with Cygwin or SUA tools on a case-sensitive machine, but I suppose they can just use PosixPath.
components: Library (Lib), Windows
nosy: benrg, paul.moore, steve.dower, tim.golden, zach.ware
title: pathlib.(Pure)WindowsPaths can compare equal but refer to different files
versions: Python 3.4, Python 3.5, Python 3.6, Python 3.7
Python tracker <report at bugs.python.org>
More information about the New-bugs-announce