solution to cross-platform path handling problems
I will talk about separating "mount"s and "path" concepts in path handling. On the great talk about writing cross-platform applications back in 2010 there is a good point about Python's cross-platform abstraction to path issues. http://clanmills.com/files/dist/doc/cross_platform.html#python-batteries-inc... Recent noize around new pathlib and my own experience with os.path made me change my mind that Python has a convenient library for cross-platform path handling. It is much better than dealing with slashed strs (true), but there are still hidden issues (that I can not even summarize, because I don't know what tracker query should I run to get it). While criticizing "pathlib" to see what I dislike about it, I realized that there is a lot of ambiguity in the world of filesystem/resource paths. Every platform-specific path library fails, because from one side people don't know differences between all operating systems, probably because they don't want, don't have time or info. On the other side people need to write cross-platform apps. "pathlib" does a good job by providing PEP with info, but I think that architecturally it doesn't solve the problem of path handling complexity. Syntax sugar - yes, explicit approach - yes, time savings - no, more readable code - "no > yes", code that frees you from thinking how "these three lines" will work on MacOS/Unix/Windows - no. The root of the problem is in traditional "relative" vs "absolute" path approach. Take "Definitions" from PEP 428. """ 1. All paths can have a drive and a root. For POSIX paths, the drive is always empty. 2. A relative path has neither drive nor root. 3. A POSIX path is absolute if it has a root. A Windows path is absolute if it has both a drive and a root. A Windows UNC path (e.g.\\host\share\myfile.txt) always has a drive and a root (here, \\host\share and \, respectively). 4. A path which has either a drive or a root is said to be anchored. Its anchor is the concatenation of the drive and root. Under POSIX, "anchored" is the same as "absolute". """ Good decomposition and problem overview, but hardly a solution or a "correct" representation as I see it. All terminology above can be reduced to just two cross-platform terms: "mount point" and "path". "path" is always relative to "mount point". Either can be missing. "mount point" is system-dependent. 1. All paths may have the mount point. 2. All paths without mount point are relative. 3. Default mount point for POSIX to make path absolute is '/'. Default mount point on Windows is current drive (e.g. 'c:/'), or UNC server address (e.g.\\host\). 4. Any (absolute?) path may be the mount point itself 5. path without mount point is called "relative" I don't know that should be API for that, but I'd be interesting to try it. One of the reasons I want to do this terminology is that semantically I do work more with URL paths than with file system paths and I don't see difference between them. When I move application from www.com site root to some www.com/endpoint, my app doesn't stop working, because it is written to work with any www.com/endpoint - not just with absolute paths that point to site root. I think that is the value that Python can provide to help build apps that are architecturally more "correct" and system-independent. -- anatoly t.
On Nov 23, 2013, at 8:14, anatoly techtonik <techtonik@gmail.com> wrote:
1. All paths may have the mount point. 2. All paths without mount point are relative. 3. Default mount point for POSIX to make path absolute is '/'. Default mount point on Windows is current drive (e.g. 'c:/'), or UNC server address (e.g.\\host\). 4. Any (absolute?) path may be the mount point itself 5. path without mount point is called "relative"
The first problem with this is that there is already an established meaning for "mount point" in the POSIX world that this is very different from Meanwhile, treating \\host\ as a root doesn't work. If you just skim the docs on UNC pathnames, or actually use them for anything, this is obvious. It's the \\host\share\ that's a root. \\host\ is not a usable path, and doesn't refer to anything path-like at the NT objects level, or the SMB/CIFS protocol. You can't .. above the share. You can't treat it as a drive. You can't mount it. (Yes, there are various places, especially in the msvcrt posix-like wrapper functions, where \\host\share\..\othershare works, but that's only because those functions are treating paths as plain strings and ignoring the semantics. The same functions also let you do \\host\..\otherhost\share, so if they imply that the host alone makes a path, they also imply that \\ alone makes a path, so the host still isn't a root.) Also, this completely ignores the problem with pathlib that you're trying to solve: Windows paths don't come in just two forms, relative and absolute; they also have two intermediate forms, D:foo (which is relative to the current working directory on drive D rather than the current drive), and \foo (which is absolute on the current working drive). To handle these paths, you need to go beyond the notion of a current working directory and represent the notion Windows actually uses: a current working drive, and a current working directory on each drive. And to deal with the way cd'ing to a UNC path interacts with this... It's too complicated to spell out in one sentence, but if you read the MSDN docs they make it pretty clear. (Or just play around with Internet Explorer's file:// paths, which are Microsoft's answer to how to fit their filesystem into something cross-platform. If you think you can do better than them, you'll probably want to understand what they did and why.)
participants (2)
-
anatoly techtonik
-
Andrew Barnert