Add mechanism to check if a path is a junction (for Windows)

Junctions are contextually similar to symlinks on Windows. I propose adding a mechanism to both pathlib.Path and os.path to check if a given path is a junction or not. Currently is_symlink/islink return False for junctions. Maybe isjunction in os.path and is_junction in pathlib.Path? Part of me thinks about adding an junction_ok param to the existing islink and is_symlink since they are often similar in usage. Thoughts?

On 11/7/22, Charles Machalow <csm10495@gmail.com> wrote:
Junctions are contextually similar to symlinks on Windows.
Junctions (i.e. IO_REPARSE_TAG_MOUNT_POINT) are implemented to behave as mount points for local volumes, so there are a couple of important differences. In a remote path, a junction gets resolved on the server side, which is always possible because the target of a junction must be a local volume (i.e. local to the server). Thus a junction that targets "C:\spam" resolves to the "C:" drive on the remote system. If you're resolving a junction manually via `os.readlink()`, take care to never resolve a remote junction target as a local path such as "C:\spam". That would not only be wrong but also potentially harmful if client files get mistakenly modified, replaced, or deleted. On the other hand, a remote symlink that targets "C:\spam" gets resolved by the client and thus always resolves to the local "C:" drive of the client. This depends on the client system allowing remote-to-local (R2L) symlinks, which is disabled by default for good reason. When resolving a symlink manually, at worst you'll be in violation of the system's L2L, L2R, R2L, or R2R symlink policy. Secondly, the target of a junction does not replace the previously traversed path when the system parses a path. This affects how a relative symlink gets resolved, in which case traversed junctions behave like Unix bind mount points. Say that "E:\eggs\spamlink" is a relative symlink that targets "..\spam". When accessed directly, this symbolic link resolves to "E:\spam". Say that "C:\mount\junction" targets "E:\eggs". Then "C:\mount\junction\spamlink" resolves to "C:\mount\spam", a different file in this case. In contrast, the target of a symlink always replaces the traversed path when the system parse a path. Say that "C:\mount\symlink" targets "E:\eggs". Then "C:\mount\symlink\spamlink" resolves to "E:\spam", the same as if "E:\eggs\spamlink" had been opened directly.
Currently is_symlink/islink return False for junctions.
Some API contexts, libraries, and applications only support IO_REPARSE_POINT_SYMLINK reparse points as symlinks. For general compatibility that's the only type of reparse point that reliably counts as a "symlink". Also, part of the rationale for this division is that currently we cannot copy a junction via os.readlink() and os.symlink(). If we were to copy a junction as a symlink, in general this could change how the target path is resolved or how the link behaves in the context of relative symlinks. It would be less of an issue if os.readlink() returned an object type that allowed duplicating any name-surrogate reparse point via os.symlink(). Instead of calling WinAPI CreateSymbolicLinkW() in such cases, os.symlink() would create the target file/directory and directly set the reparse point via FSCTL_SET_REPARSE_POINT.

So would you be for specific methods to check if a given path is a junction? On Mon, Nov 7, 2022, 4:49 PM Eryk Sun <eryksun@gmail.com> wrote:
On 11/7/22, Charles Machalow <csm10495@gmail.com> wrote:
Junctions are contextually similar to symlinks on Windows.
Junctions (i.e. IO_REPARSE_TAG_MOUNT_POINT) are implemented to behave as mount points for local volumes, so there are a couple of important differences.
In a remote path, a junction gets resolved on the server side, which is always possible because the target of a junction must be a local volume (i.e. local to the server). Thus a junction that targets "C:\spam" resolves to the "C:" drive on the remote system. If you're resolving a junction manually via `os.readlink()`, take care to never resolve a remote junction target as a local path such as "C:\spam". That would not only be wrong but also potentially harmful if client files get mistakenly modified, replaced, or deleted. On the other hand, a remote symlink that targets "C:\spam" gets resolved by the client and thus always resolves to the local "C:" drive of the client. This depends on the client system allowing remote-to-local (R2L) symlinks, which is disabled by default for good reason. When resolving a symlink manually, at worst you'll be in violation of the system's L2L, L2R, R2L, or R2R symlink policy.
Secondly, the target of a junction does not replace the previously traversed path when the system parses a path. This affects how a relative symlink gets resolved, in which case traversed junctions behave like Unix bind mount points. Say that "E:\eggs\spamlink" is a relative symlink that targets "..\spam". When accessed directly, this symbolic link resolves to "E:\spam". Say that "C:\mount\junction" targets "E:\eggs". Then "C:\mount\junction\spamlink" resolves to "C:\mount\spam", a different file in this case. In contrast, the target of a symlink always replaces the traversed path when the system parse a path. Say that "C:\mount\symlink" targets "E:\eggs". Then "C:\mount\symlink\spamlink" resolves to "E:\spam", the same as if "E:\eggs\spamlink" had been opened directly.
Currently is_symlink/islink return False for junctions.
Some API contexts, libraries, and applications only support IO_REPARSE_POINT_SYMLINK reparse points as symlinks. For general compatibility that's the only type of reparse point that reliably counts as a "symlink".
Also, part of the rationale for this division is that currently we cannot copy a junction via os.readlink() and os.symlink(). If we were to copy a junction as a symlink, in general this could change how the target path is resolved or how the link behaves in the context of relative symlinks.
It would be less of an issue if os.readlink() returned an object type that allowed duplicating any name-surrogate reparse point via os.symlink(). Instead of calling WinAPI CreateSymbolicLinkW() in such cases, os.symlink() would create the target file/directory and directly set the reparse point via FSCTL_SET_REPARSE_POINT.

On 11/7/22, Charles Machalow <csm10495@gmail.com> wrote:
So would you be for specific methods to check if a given path is a junction?
I'd prefer for ismount() to be modified to always return true for a junction. This would be a significant rewrite of the current implementation, which is only true for a junction that targets a system volume mount point (i.e. "\\?\Volume{GUID}\"). Of course ismount() wouldn't be true for only junctions. It's also be true for the root path of any drive, device, or UNC share if it's an existing filesystem directory. Implementing a function that checks for only a junction is simple enough. For example: def isjunction(path): """Test whether a path is a junction. """ try: st = os.lstat(path) except (OSError, ValueError, AttributeError): return False return bool(st.st_reparse_tag & stat.IO_REPARSE_TAG_MOUNT_POINT) To be completely certain, sometimes st_file_attributes is also checked for stat.FILE_ATTRIBUTE_REPARSE_POINT. But a filesystem that sets a reparse point on a directory without also setting the latter file attribute would be dysfunctional.

I tend to prefer adding isjunction instead of changing ismount since I tend to not think about junctions as being mounts (but closer to symlinks).. but I guess either way the closeness of the concepts is a different story than the specific ask here. In other words: for clarity, adding a specific method makes the most sense to me. On Mon, Nov 7, 2022, 5:53 PM Eryk Sun <eryksun@gmail.com> wrote:
On 11/7/22, Charles Machalow <csm10495@gmail.com> wrote:
So would you be for specific methods to check if a given path is a junction?
I'd prefer for ismount() to be modified to always return true for a junction. This would be a significant rewrite of the current implementation, which is only true for a junction that targets a system volume mount point (i.e. "\\?\Volume{GUID}\"). Of course ismount() wouldn't be true for only junctions. It's also be true for the root path of any drive, device, or UNC share if it's an existing filesystem directory.
Implementing a function that checks for only a junction is simple enough. For example:
def isjunction(path): """Test whether a path is a junction. """ try: st = os.lstat(path) except (OSError, ValueError, AttributeError): return False return bool(st.st_reparse_tag & stat.IO_REPARSE_TAG_MOUNT_POINT)
To be completely certain, sometimes st_file_attributes is also checked for stat.FILE_ATTRIBUTE_REPARSE_POINT. But a filesystem that sets a reparse point on a directory without also setting the latter file attribute would be dysfunctional.

On 11/8/22, Charles Machalow <csm10495@gmail.com> wrote:
I tend to prefer adding isjunction instead of changing ismount since I tend to not think about junctions as being mounts (but closer to symlinks)..
Junctions are mount points that are similar to Unix bind mounts where it counts -- in the behavior that's implemented for them in the kernel. This behavior isn't exclusive to just volume mount points. It's implemented the same for all junctions, and it's distinctly different from symlinks. There are times that I want to handle non-root mount points as if they're symlinks, such as deleting them in rmtree(). There are times where I want to handle them distinctly from symlinks, such as adding code in copytree() to copy a junction.
I guess either way the closeness of the concepts is a different story than the specific ask here. In other words: for clarity, adding a specific method makes the most sense to me.
Adding a posixpath.isjunction() function that's always false seems a waste compared to common support for os.path.ismount(). On the other hand, the realpath() call in posixpath.ismount() is expensive, so calling os.path.ismount() to decide how to handle a directory would be expensive on POSIX.

I'm not technical enough here to try to argue which it is closer to. We can say it's like so and so in implementation, but I just liken it a certain way. I think for regular users it makes most sense to just have a specific function rather than expecting folks to know concept similarities... a simple function that does one thing well is best... But that's just my opinion. Funny enough in PowerShell, for prints an "l" for both symlinks and junctions.. so it kind of thinks of it as a link of some sort too I guess. Is it that much of a waste to just return False on posix? I mean it's a couple lines and just maintains api.. and in theory can be more clear to some. An alternative is to make it just available on Windows... But I'd personally prefer a function that returns False on other than Windows to maintain api. The docs can even say that it can only return False on non-Windows. On Mon, Nov 7, 2022, 11:05 PM Eryk Sun <eryksun@gmail.com> wrote:
On 11/8/22, Charles Machalow <csm10495@gmail.com> wrote:
I tend to prefer adding isjunction instead of changing ismount since I tend to not think about junctions as being mounts (but closer to symlinks)..
Junctions are mount points that are similar to Unix bind mounts where it counts -- in the behavior that's implemented for them in the kernel. This behavior isn't exclusive to just volume mount points. It's implemented the same for all junctions, and it's distinctly different from symlinks.
There are times that I want to handle non-root mount points as if they're symlinks, such as deleting them in rmtree(). There are times where I want to handle them distinctly from symlinks, such as adding code in copytree() to copy a junction.
I guess either way the closeness of the concepts is a different story than the specific ask here. In other words: for clarity, adding a specific method makes the most sense to me.
Adding a posixpath.isjunction() function that's always false seems a waste compared to common support for os.path.ismount(). On the other hand, the realpath() call in posixpath.ismount() is expensive, so calling os.path.ismount() to decide how to handle a directory would be expensive on POSIX.

+1 on adding Path.is_junction() that returns False on non-Windows systems. (I'm a Windows user and I use junctions as well.) On Tue, Nov 8, 2022 at 9:24 AM Charles Machalow <csm10495@gmail.com> wrote:
I'm not technical enough here to try to argue which it is closer to. We can say it's like so and so in implementation, but I just liken it a certain way.
I think for regular users it makes most sense to just have a specific function rather than expecting folks to know concept similarities... a simple function that does one thing well is best... But that's just my opinion.
Funny enough in PowerShell, for prints an "l" for both symlinks and junctions.. so it kind of thinks of it as a link of some sort too I guess.
Is it that much of a waste to just return False on posix? I mean it's a couple lines and just maintains api.. and in theory can be more clear to some.
An alternative is to make it just available on Windows... But I'd personally prefer a function that returns False on other than Windows to maintain api. The docs can even say that it can only return False on non-Windows.
On Mon, Nov 7, 2022, 11:05 PM Eryk Sun <eryksun@gmail.com> wrote:
On 11/8/22, Charles Machalow <csm10495@gmail.com> wrote:
I tend to prefer adding isjunction instead of changing ismount since I tend to not think about junctions as being mounts (but closer to symlinks)..
Junctions are mount points that are similar to Unix bind mounts where it counts -- in the behavior that's implemented for them in the kernel. This behavior isn't exclusive to just volume mount points. It's implemented the same for all junctions, and it's distinctly different from symlinks.
There are times that I want to handle non-root mount points as if they're symlinks, such as deleting them in rmtree(). There are times where I want to handle them distinctly from symlinks, such as adding code in copytree() to copy a junction.
I guess either way the closeness of the concepts is a different story than the specific ask here. In other words: for clarity, adding a specific method makes the most sense to me.
Adding a posixpath.isjunction() function that's always false seems a waste compared to common support for os.path.ismount(). On the other hand, the realpath() call in posixpath.ismount() is expensive, so calling os.path.ismount() to decide how to handle a directory would be expensive on POSIX.
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/I5TAW2... Code of Conduct: http://python.org/psf/codeofconduct/

On 11/8/22, Charles Machalow <csm10495@gmail.com> wrote:
Funny enough in PowerShell, for prints an "l" for both symlinks and junctions.. so it kind of thinks of it as a link of some sort too I guess.
As does Python already in many cases. For example, os.lstat() doesn't traverse a mount point (junction). On Windows, symlinks and mount points are in a general category of name-surrogate reparse points. os.lstat() doesn't traverse them. If Python supported copying a mount point via os.symlink(os.readlink(src), dst), I'd be reluctantly in favor of just letting ntpath.islink() return true for a mount point, as a practical measure for seamless cross-platform implementations of functions like rmtree() and copytree(). In terms of POSIX that's nonsense, but not really on Windows.
Is it that much of a waste to just return False on posix? I mean it's a couple lines and just maintains api.. and in theory can be more clear to some.
I'm just thinking this through in terms of conceptual cost and usefulness in the standard library relative to how easy it is to implement one's own isjunction() or is_name_surrogate() test. Of course, a lot of the os.path tests have simple implementations, such as exists(), isdir() and isfile(). They're in the standard library because they're commonly needed. The question is whether isjunction() is needed enough generally to justify adding it.

I would argue that just because it was easy for one to implement doesn't mean it's easy for others. I would have had no idea how to implement this without extra Googling and confusion. Having the abstraction makes it easier for others. - Charlie Scott Machalow On Tue, Nov 8, 2022 at 1:12 AM Eryk Sun <eryksun@gmail.com> wrote:
On 11/8/22, Charles Machalow <csm10495@gmail.com> wrote:
Funny enough in PowerShell, for prints an "l" for both symlinks and junctions.. so it kind of thinks of it as a link of some sort too I
guess.
As does Python already in many cases. For example, os.lstat() doesn't traverse a mount point (junction). On Windows, symlinks and mount points are in a general category of name-surrogate reparse points. os.lstat() doesn't traverse them.
If Python supported copying a mount point via os.symlink(os.readlink(src), dst), I'd be reluctantly in favor of just letting ntpath.islink() return true for a mount point, as a practical measure for seamless cross-platform implementations of functions like rmtree() and copytree(). In terms of POSIX that's nonsense, but not really on Windows.
Is it that much of a waste to just return False on posix? I mean it's a couple lines and just maintains api.. and in theory can be more clear to some.
I'm just thinking this through in terms of conceptual cost and usefulness in the standard library relative to how easy it is to implement one's own isjunction() or is_name_surrogate() test. Of course, a lot of the os.path tests have simple implementations, such as exists(), isdir() and isfile(). They're in the standard library because they're commonly needed. The question is whether isjunction() is needed enough generally to justify adding it.

On Tue, Nov 08, 2022 at 09:55:04PM +0000, Barry wrote:
But anyone that is suitably motivated can implement this.
This is true for every function in a Turing Complete language. Perhaps we should start using iota or jot? :-) https://en.wikipedia.org/wiki/Iota_and_Jot A "suitably motivated" person could implement ismount, islink, the entire os and Pathlib modules, and more. But they probably won't do as good a job of it as what we have. On systems that support junction points, they are as much a fundamental file system object as symlinks, directories and mount points. Non-experts will probably have to google for hints how to implement this, and the internet is full of bad advice. On Stackoverflow, I find this question: https://stackoverflow.com/questions/17174703/symlinks-on-windows which starts off by giving the false information (or at least obsolete) that Windows doesn't support symlinks only shortcuts (NTFS has supported symlinks since at least Windows Vista, in 2006), and then later gives a solution for detecting junction points which requires ctypes. Most Python coders are using Windows. Surely it is time to do better for them than "just roll your own"? -- Steve

On Tue, 8 Nov 2022 at 23:34, Steven D'Aprano <steve@pearwood.info> wrote:
Most Python coders are using Windows. Surely it is time to do better for them than "just roll your own"?
While I frequently advocate on the side of "not every 3-line function needs to be in the stdlib", there are a lot of convenience functions for Unix in the stdlib (reflecting the fact that Python was initially developed on Unix) and having them for Windows as well seems only fair. Given the existence of pathlib.Path.is_fifo(), I think it's reasonable to include is_junction() too. (There's no isfifo() in os.path, though, so the argument for os.path.isjunction() is correspondingly weaker). Paul

Paul Moore writes:
While I frequently advocate on the side of "not every 3-line function needs to be in the stdlib", there are a lot of convenience functions for Unix in the stdlib
IMO "is_*" functions aren't exactly "convenience" functions, even if they're only a couple of lines implemented in terms of stat. I think of them as "discoverability" functions -- by pulling these distinctions out of class stat's data and documenting them as top- level functions it's a lot easier to learn about and make distinctions when you happen to coding in the neighborhood of "things you can do with file system objects of type X but not quite with those of type X' (which is very similar...)". I mean, I'm not willing to die on the hill of that particular terminology, but I do think it's worth making a distinction between saving a few dozen keystrokes with a function that "would be obvious to any skilled practitioner how to write it in 3 lines", and a function that would require many a skilled practitioner a google and reading a couple man pages to write in 3 lines. Steve

On 11/7/22, Eryk Sun <eryksun@gmail.com> wrote:
def isjunction(path): """Test whether a path is a junction. """ try: st = os.lstat(path) except (OSError, ValueError, AttributeError): return False return bool(st.st_reparse_tag & stat.IO_REPARSE_TAG_MOUNT_POINT)
The bitwise AND check in the above is wrong. It should check whether the tag *equals* IO_REPARSE_TAG_MOUNT_POINT. Sorry, this was an editing mistake when I simplified the expression to remove a redundant check of st_file_attributes. This idea is being developed for Python 3.12: https://github.com/python/cpython/issues/99547 https://github.com/python/cpython/pull/99548

On Mon, Nov 07, 2022 at 07:31:36PM -0000, Charles Machalow wrote:
I propose adding a mechanism to both pathlib.Path and os.path to check if a given path is a junction or not. Currently is_symlink/islink return False for junctions.
+1 on a function is_junction. I am neutral on the question of whether that function should: 1. only exist on Windows, 2. or exist on other platforms but always return False. Prior art suggests the second is probably better: when Python doesn't support symbolic links, `os.islink` exists but always returns False. https://docs.python.org/3/library/os.path.html#os.path.islink I am also neutral on whether ismount() on Windows should always return True for junctions, as well as mount points. I leave that to Windows experts to decide. -1 on adding a flag parameter to existing functions. -- Steve
participants (7)
-
Barry
-
Charles Machalow
-
Eryk Sun
-
Paul Moore
-
Ram Rachum
-
Stephen J. Turnbull
-
Steven D'Aprano