Mailman 3 shutil.symlink to allow non-race replacement of existing link targets - Python-ideas

newer
Overloading assignment concrete...

shutil.symlink to allow non-race replacement of existing link targets

Tom Hale

May 13, 2019

9:38 a.m.

As suggested by Toshio Kuratomi at https://bugs.python.org/issue36656, I am raising this here for inclusion in the shutil module. Mimicking POSIX, os.symlink() will raise FileExistsError if the link name to be created already exists. A common use case is overwriting an existing file (often a symlink) with a symlink. Naively, one would delete the file named link_name file if it exists, then call symlink(). This "solution" is already 3 lines of code, and without exception handling it introduces the race condition of a file named link_name being created between unlink and symlink. Depending on the functionality required, I suggest: * os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file Handling all possible race conditions (some detailed in issue36656) is non-trivial, however this is the best that I have come up with so far: ========================================================================== import os, tempfile def symlink(target, link_name): '''Create a symbolic link link_name pointing to target. Overwrites link_name if it exists. ''' # os.replace() may fail if files are on different filesystems link_dir = os.path.dirname(link_name) # Link to a temporary filename that doesn't exist while True: temp_link_name = tempfile.mktemp(dir=link_dir) # os.* functions mimic as closely as possible system functions # The POSIX symlink() returns EEXIST if link_name already exists # https://pubs.opengroup.org/onlinepubs/9699919799/functions/symlink.html try: os.symlink(target, temp_link_name) break except FileExistsError: pass # Replace link_name with temp_link_name try: # Pre-empt os.replace on a directory with a nicer message if os.path.isdir(link_name): raise IsADirectoryError(f"Cannot symlink over existing directory: '{link_name}'") os.replace(temp_link_name, link_name) except: if os.path.islink(temp_link_name): os.remove(temp_link_name) raise ========================================================================== The documentation (https://docs.python.org/3/library/shutil.html) I suggest for this is: shutil.symlink(target, link_name) Create a symbolic link named link_name pointing to target, overwriting target if it exists. If link_name is a directory, IsADirectoryError is raised. To not overwrite target, use os.symlink() ========================================================================== It would be tempting to do: while True: try: os.symlink(target, link_name) break except FileExistsError: os.remove(link_name) But this has a race condition when replacing a symlink should should *always* exist, eg: /lib/critical.so -> /lib/critical.so.1.2 When upgrading by: symlink('/lib/critical.so.2.0', '/lib/critical.so') There is a point in time when /lib/critical.so doesn't exist. ========================================================================== One issue I see with my suggested code is that the file at temp_link_name could be changed before target is replaced with it. This is mitigated by the randomness introduced by mktemp(). While it is far less likely that a file is accessed with a random and unknown name than with an existing known name, I seek input on a solution if this is an unacceptable risk. Prior art: * https://bugs.python.org/issue36656 (already mentioned above) * https://stackoverflow.com/a/55742015/5353461 * https://git.savannah.gnu.org/cgit/coreutils.git/tree/src/ln.c -- Tom Hale

Show replies by date

Anders Hovmöller

May 2019

10:31 a.m.

...

On 13 May 2019, at 11:38, Tom Hale <tom@hale.ee> wrote:

* os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file

This seems error prone to me. There is nothing about those names that hint at the difference in behavior. An optional "overwrite_if_exists=False" flag seems much nicer. / Anders

Steven D'Aprano

5:38 p.m.

On Mon, May 13, 2019 at 12:31:08PM +0200, Anders Hovmöller wrote:

...

...
On 13 May 2019, at 11:38, Tom Hale <tom@hale.ee> wrote:

* os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file

This seems error prone to me. There is nothing about those names that hint at the difference in behavior.

One of them is in the `os` module, and therefore we should expect that it will be a thin wrapper around the OS functionality. The other is in the `shutil` module, and therefore we should expect that it will be a shell utility function of arbitrary complexity. Being found in different modules should always be treated as a sign that the functions may be different. (If they're the same, why do we need both?)

...

An optional "overwrite_if_exists=False" flag seems much nicer.

Aside from the argument name being too verbose, that violates the rule of thumb "avoid constant bool flags" design principle. (Note that this is a *rule of thumb*, not a hard law: there are times that it can and should be violated; its also not a well-known principle and so lots of APIs violate it even when they shouldn't.) If you have a function which takes a boolean argument to swap between two different modes, and the usual calling pattern is to pass a constant: function(arg, x, y, flag=True) rather than a flag which is not know until runtime, then it is (usually) better to use two distinct functions rather than one function with a parameter that swaps modes. Another way to put it: in general, if you have two modes, you should have two functions. For example: - str.find and rfind, not str.find with a from_right parameter; - bisect.insort_left and insort_right, not bisect.insort with a from_left parameter; - statistics.stdev and pstdev rather than population parameter; - os.setuid and setgid rather than a change_group parameter; - zip and itertools.zip_longest, not zip with longest parameter. This principle doesn't apply when the flag is typically not known until runtime. For example, this is unfortunate: if get_some_flag(*args): result = str.find(spam) else: result = str.rfind(spam) but rare. If it were very common, then it would be justified to provide a single API with a mode flag switching behaviours. The idea of "no constant bool arguments" is that if you typically know which mode you want at edit-time (that includes runtime in interactive environments), the two modes should be distinguished by function name rather than by a parameter. I expect that, typically, users of this will know ahead of time whether they want to overwrite symlinks or not. -- Steven

Anders Hovmöller

6:23 p.m.

...

On 13 May 2019, at 19:38, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Mon, May 13, 2019 at 12:31:08PM +0200, Anders Hovmöller wrote:

...
On 13 May 2019, at 11:38, Tom Hale <tom@hale.ee> wrote:

* os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file

This seems error prone to me. There is nothing about those names that hint at the difference in behavior.

One of them is in the `os` module, and therefore we should expect that it will be a thin wrapper around the OS functionality.

The other is in the `shutil` module, and therefore we should expect that it will be a shell utility function of arbitrary complexity.

Being found in different modules should always be treated as a sign that the functions may be different. (If they're the same, why do we need both?)

Assuming all users know there is another. And if you "from x import y" the call site is identical for the two different functions. This is just asking for trouble.

...

...
An optional "overwrite_if_exists=False" flag seems much nicer.

Aside from the argument name being too verbose, that violates the rule of thumb "avoid constant bool flags" design principle.

Verbose is better than cryptic. Having the exact same name as something that does something else is pretty cryptic.

...

(Note that this is a *rule of thumb*, not a hard law: there are times that it can and should be violated; its also not a well-known principle and so lots of APIs violate it even when they shouldn't.)

If you have a function which takes a boolean argument to swap between two different modes, and the usual calling pattern is to pass a constant:

function(arg, x, y, flag=True)

rather than a flag which is not know until runtime, then it is (usually) better to use two distinct functions rather than one function with a parameter that swaps modes.

OK. Then a different name seems also to be in order. symlink/setsymlink seems a lot better to me for example. / Anders

Serge Matveenko

9:22 p.m.

On Mon, May 13, 2019 at 9:24 PM Anders Hovmöller <boxed@killingar.net> wrote:

...

...
On 13 May 2019, at 19:38, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Mon, May 13, 2019 at 12:31:08PM +0200, Anders Hovmöller wrote: An optional "overwrite_if_exists=False" flag seems much nicer.

Aside from the argument name being too verbose, that violates the rule of thumb "avoid constant bool flags" design principle.

Verbose is better than cryptic. Having the exact same name as something that does something else is pretty cryptic.

As a regular library user, I see and expect no obvious difference between `os.symlink` and `shutil.symlink`. Probably they should have different names if the behavior is not the same. As a constant Linux user, I'd expect a `symlink` function to do something similar to `ln -s` which also could be used as `ln -sf`. So, something like `symlink(force:bool=False)` looks like an expected and "guessable". Thanks!

Steven D'Aprano

1:33 a.m.

On Tue, May 14, 2019 at 12:22:05AM +0300, Serge Matveenko wrote:

...

As a regular library user, I see and expect no obvious difference between `os.symlink` and `shutil.symlink`.

You "see ... no obvious difference" between two functions that live in completely different modules? Isn't the fact that they live in *different modules* obvious enough? If os.symlink was exactly the same as shutil.symlink, we would not need them both. Do you also see and expect no obvious difference between list.remove and os.remove? If so, today is your opportunity to learn something and become a better programmer! *wink* Modules and classes are both namespaces, and we have namespaces with different names precisely so that they can do something different. This is why math.sqrt and cmath.sqrt are different: py> math.sqrt(-2) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: math domain error py> cmath.sqrt(-2) 1.4142135623730951j This is why *all of these* are different: - os.open - gzip.open - bz2.open - aifc.open - shelve.open - tokenize.open - wave.open - webbrowser.open and none of those are the same as the builtin open. A few more examples to prove this isn't an isolated thing: * re.split, shlex.split and str.split * dis.dis, pickletools.dis * copy.copy, shutil.copy The bottom line is that it is completely normal and not uncommon for functions to be distinguished by the namespace they are found in. It is both unreasonable and unnecessary to force objects in different namespaces to have unique names.

...

Probably they should have different names if the behavior is not the same.

The behaviour is broadly the same. They both create a symlink. Only the fine details are different.

...

As a constant Linux user, I'd expect a `symlink` function to do something similar to `ln -s` which also could be used as `ln -sf`. So, something like `symlink(force:bool=False)` looks like an expected and "guessable".

That's not an unreasonable suggestion. But the Windows mklink command does not seem to take a "force" parameter: https://docs.microsoft.com/en-us/windows-server/administration/windows-comma... so not so guessable to Windows users. The ln command on my Linux system takes sixteen options, and we surely don't expect to match all of them. In my previous response, I already explained why constant bool parameters were (often) an interface anti-pattern. I'm on the fence with this one: I don't think that os.symlink(... force=False) is awful, but I still prefer the more complex functionality to go into shutils and leave the os version to be a thin wrapper around the basic OS functionality. If there's no consensus here, I guess the final decision will be that of the person doing the work. -- Steven

Stephen J. Turnbull

2:43 a.m.

Steven D'Aprano writes:

...

On Tue, May 14, 2019 at 12:22:05AM +0300, Serge Matveenko wrote:

...
As a regular library user, I see and expect no obvious difference between `os.symlink` and `shutil.symlink`.

You "see ... no obvious difference" between two functions that live in completely different modules?

Isn't the fact that they live in *different modules* obvious enough?

Please, Steve, you didn't need to go there. Because it's not enough. It's not unheard of for modules to include functionality from other modules with "thin wrappers", merely to change the calling convention, or even just to avoid an import. Even within the stdlib. The point is that subtle issues like the presence or absence of race conditions require careful reading of the documentation, when some of said documentation may not even be in Python in the case of modules like os. If one of these has a race condition and the other doesn't, I think it would be reasonable for the documentation of the racy one to acknowledge the bug and point to the non-racy one. I'm not sure it's a great idea to put lots of effort into fixing shutils. I just assume that one is as risky (racy, risque) as any shell! -- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

Steven D'Aprano

3:16 a.m.

On Tue, May 14, 2019 at 11:43:32AM +0900, Stephen J. Turnbull wrote: [...]

...

...
Isn't the fact that they live in *different modules* obvious enough?

Please, Steve, you didn't need to go there.

Because it's not enough.

Of course it is enough to presume that if they come from different modules, they are probably different unless proven otherwise. As I have shown (perhaps you stopped reading and didn't bother reading through the rest of my post) it is not uncommon for different namespaces to define radically different functions with the same name.

...

It's not unheard of for modules to include functionality from other modules with "thin wrappers", merely to change the calling convention, or even just to avoid an import. Even within the stdlib.

Um, yes? And how is that relevant? If there's a "thin wrapper" changing the calling convention, then the APIs are different and they are not the same. That supports my position: functions with the same name coming from different namespaces are likely to be different. Even if a few happen to be the same that doesn't justify *assuming* that all such functions will be the same. If they're just an import, then *unless documented otherwise* they are considered an implementation detail not a public part of the API.

...

The point is that subtle issues like the presence or absence of race conditions require careful reading of the documentation, when some of said documentation may not even be in Python in the case of modules like os.

Lack of documentation means we should fix the documentation.

...

If one of these has a race condition and the other doesn't, I think it would be reasonable for the documentation of the racy one to acknowledge the bug and point to the non-racy one.

I don't think you have understood the issue here. Perhaps you should re-read the original post in the thread. There is no buggy existing version of symlink with a race condition. If the destination exists, os.symlink correctly does the right thing: it raises an error. py> os.symlink('/tmp/spam', '/tmp/eggs') Traceback (most recent call last): File "<stdin>", line 1, in <module> FileExistsError: [Errno 17] File exists: '/tmp/spam' -> '/tmp/eggs' And there is currently no shutils version at all. The problem is a lack of a symlink function that safely overwrites an existing file or symlink. We're just bike-shedding over its spelling and where it lives: - modify os.symlink and give it a "force" parameter - add a new function into shutils -- Steven

Stephen J. Turnbull

4:41 a.m.

Steven D'Aprano writes:

...

Of course it is enough to presume that if they come from different modules, they are probably different unless proven otherwise.

That mere presumption is not helpful because YOLO. Most of us have limited time, and that includes doc-reading time.

...

As I have shown (perhaps you stopped reading and didn't bother reading through the rest of my post) it is not uncommon for different namespaces to define radically different functions with the same name.

I read the whole post, as I always do for your posts.

...

I don't think you have understood the issue here. Perhaps you should re-read the original post in the thread.

The issue I'm responding to is your attitude, which is amply displayed here as well. So let me rephrase.

...

We're just bike-shedding over its spelling and where it lives[.]

And the color of the bikeshed turns on the connotations of the name. You (like me) spend a lot of effort on documentation and precise semantics. IME most developers do not spend nearly as much. And even for me, even if I recognize vaguely that shutils.symlink and os.symlink are different, I'm unlikely to know which has the semantics I need or even that the difference is relevant to me, unless I'm looking for it specifically. Serge's concern is valid, but perhaps not the most important. Even if it's not, your answer to Anders was much better (even if perhaps a bit sarcastic): Steven D'Aprano writes:

...

On Mon, May 13, 2019 at 08:23:34PM +0200, Anders Hovmöller wrote:

...

...
And if you "from x import y" the call site is identical for the two different functions. This is just asking for trouble.

...

Clearly your threshhold for "trouble" is a lot more sensitive than mine.

Whose threshold is closest to *the rest of the Python community* is the relevant consideration, and an empirical question. My guess is that Serge and Anders are closer to it than you and I are. Whether I'm right about that or not, I am now asking you to show more respect for that diversity in the way you phrase responses. Regards, Steve -- Associate Professor Division of Policy and Planning Science http://turnbull.sk.tsukuba.ac.jp/ Faculty of Systems and Information Email: turnbull@sk.tsukuba.ac.jp University of Tsukuba Tel: 029-853-5175 Tennodai 1-1-1, Tsukuba 305-8573 JAPAN

Serge Matveenko

7:26 a.m.

On Tue, May 14, 2019 at 6:19 AM Steven D'Aprano <steve@pearwood.info> wrote:

...

The problem is a lack of a symlink function that safely overwrites an existing file or symlink. We're just bike-shedding over its spelling and where it lives:

- modify os.symlink and give it a "force" parameter - add a new function into shutils

How about introducing `force=False` argument to `pathlib.Path.symlink_to` method? It looks like a good place for this as `pathlib` is actually the place where higher-level things to operate on paths live already and this method already has a different name and a slightly different signature.

Steven D'Aprano

1:29 p.m.

On Tue, May 14, 2019 at 10:26:28AM +0300, Serge Matveenko wrote:

...

How about introducing `force=False` argument to `pathlib.Path.symlink_to` method? It looks like a good place for this as `pathlib` is actually the place where higher-level things to operate on paths live already

I don't think we can say pathlib is the place for higher-level operations. It isn't what the os docs say: https://docs.python.org/3/library/os.html it explicitly says shutils is for higher-level path operations. The beauty of shutil is that it can operate on pathnames whether they are strings, bytes or Path objects, and doesn't force you to use one or the other. Aside: I'm not sure how much progress has been made on making Path objects fully usable by shutil. There's this open issue: https://bugs.python.org/issue30235 so I think progress is being made but may not be complete. -- Steven

Anders Hovmöller

4:40 a.m.

...

On 14 May 2019, at 03:33, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Tue, May 14, 2019 at 12:22:05AM +0300, Serge Matveenko wrote:

As a regular library user, I see and expect no obvious difference between `os.symlink` and `shutil.symlink`.

You "see ... no obvious difference" between two functions that live in completely different modules?

This happens due to historical accidents all the time.

...

Isn't the fact that they live in *different modules* obvious enough?

No. See above.

...

If os.symlink was exactly the same as shutil.symlink, we would not need them both.

Sure we could. See above.

...

Do you also see and expect no obvious difference between list.remove and os.remove? If so, today is your opportunity to learn something and become a better programmer! *wink*

That's rude. That might work in person but I'd doubt it. Over email the "wink" doesn't change it enough. / Anders

Serge Matveenko

7:37 a.m.

On Tue, May 14, 2019 at 4:34 AM Steven D'Aprano <steve@pearwood.info> wrote:

...

You "see ... no obvious difference" between two functions that live in completely different modules?

<skipped>

The bottom line is that it is completely normal and not uncommon for functions to be distinguished by the namespace they are found in. It is both unreasonable and unnecessary to force objects in different namespaces to have unique names.

There is no need to explain basic things to me. Thank you very much indeed! My point was that in case of `os.symlink` vs `shutil.symlink` it is not obvious how they are different even taking into account their namespaces. In the case `os.remove` vs `list.remove` the difference is obvious as namespaces clearly represent different subjects. On the other hand, it is not that easy to guess the developer intent comparing `os` and `shutil` subjects. Why there are two different implementations? Which one should I use? While I don't need "force" functionality at the moment is it ok to use the one from `os` package or I would need that in the future? Is it worth just stick with the one from `shutil` and forget that `os.symlink` exists at all?

eryk sun

8:48 a.m.

On 5/14/19, Serge Matveenko <s@matveenko.ru> wrote:

...

My point was that in case of `os.symlink` vs `shutil.symlink` it is not obvious how they are different even taking into account their namespaces.

I prefer to reserve POSIX system call names if possible, unless it's a generic name such as "open" or "close". Note that there's also the possibility of extending pathlib's `symlink_to` method.

Steven D'Aprano

12:13 p.m.

On Tue, May 14, 2019 at 10:37:57AM +0300, Serge Matveenko wrote:

...

My point was that in case of `os.symlink` vs `shutil.symlink` it is not obvious how they are different even taking into account their namespaces.

Okay, but that's not what you said. I can't respond to what you meant to say in hindsight, only what you actually said at the time: I SEE AND EXPECT no obvious difference between `os.symlink` and `shutil.symlink`. [emphasis added] Can we agree that the difference between os.symlink and shutils.symlink is visible (they are in different namespaces) and we should expect that they are different for that reason? If we *can't* agree on that point, if you truly see no difference between (for example) os.open bz2.open gzip.open io.open etc. then I don't think we will ever reach agreement. Our expectations are just too different. I am happy to agree with you that the difference in qualified names is not enough to tell *how they differ*, only that they likely do differ. To know how they differ, one needs to read the docs, or ask. I don't see this as a problem.

...

In the case `os.remove` vs `list.remove` the difference is obvious as namespaces clearly represent different subjects. On the other hand, it is not that easy to guess the developer intent comparing `os` and `shutil` subjects.

There's no need to guess. The documentation is clear: https://docs.python.org/3/library/os.html This module provides a portable way of using operating system dependent functionality. [...] and for high-level file and directory handling see the shutil module. So I guess the question comes down to whether we believe that safely replacing an existing file with a symlink is low-level OS functionality or high-level functionality. Here are three pieces of evidence to help decide: (1) The Windows mklink command has no option to overwrite files: that suggests that on Windows, "make a syslink and overwrite the destination" is not low-level OS functionality. (2) The Linux symlink system call does not override files either: man symlink This also argues against putting this in the os module. (3) The ``ln`` shell command does have an option to force overwriting. This argues in favour of treating it as a shell command and putting it in shutils. In the rest of your email, you asked a bunch of questions. I assume that you intend them as rhetorical questions, but I don't know why you included them since they don't seem relevant. We could ask precisely the same questions no matter what we called this proposed function or where we put it. -- Steven

Anders Hovmöller

12:41 p.m.

...

On 14 May 2019, at 14:13, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Tue, May 14, 2019 at 10:37:57AM +0300, Serge Matveenko wrote:

My point was that in case of `os.symlink` vs `shutil.symlink` it is not obvious how they are different even taking into account their namespaces.

Okay, but that's not what you said. I can't respond to what you meant to say in hindsight, only what you actually said at the time:

Sure you can. By responding to the mail where he clarifies what he meant but failed to get across the first time. Pointing out again that he failed to communicate to you like he tried the first time around is not very constructive. It would be better if you could let that go and just move forward. / Anders

Steven D'Aprano

2:24 a.m.

On Mon, May 13, 2019 at 08:23:34PM +0200, Anders Hovmöller wrote:

...

...
Being found in different modules should always be treated as a sign that the functions may be different. (If they're the same, why do we need both?)

Assuming all users know there is another.

How do users know the os module exists in the first place? How do they know to use list.append rather than list += [item]? If you're going to cater only for the most ignorant, know-nothing subset of users, that's an argument against providing *any* functionality at all since they won't know it exists. We ought to assume competent users who know how to *find stuff out* by reading the docs, asking on Stackoverflow or other forums, or by reading other people's code and learning from what they see.

...

And if you "from x import y" the call site is identical for the two different functions. This is just asking for trouble.

Clearly your threshhold for "trouble" is a lot more sensitive than mine. Using `from module import ...` is always a compromise between verbosity and clarity. We don't want to be like Java: configs.config.server-config.monitoring-service.module-monitoring-levels (I copied that example from the Oracle docs) or even worse. But short names can sometimes be less clear. If the coder is using `from module import` they are explicitly preferring brevity, *and that is okay*. Any even slightly experienced Python programmer will know what to do when you see an unfamiliar function in code: syslink(...) Most of the time we don't need to drill down in fine detail. "It makes symlinks" is enough for us to move on and we really don't want names which explicitly document everything about the function: create_symlink_to_file_on_posix_filesystems_or_windows_vista_or_better_but_not_if_destination_already_exists(...) *wink*. Function names are mnemonics, not documentation. But if you do need to drill down and look at the function in fine detail, we know what to do: look for a local function definition, or a from import, and take it from there. In the worst case, you might have both in the same namespace so you need to resolve which one wins. This is called programming. We all do it. It isn't any trouble except in cases of deliberately or accidently obfuscated code.

...

...
...
An optional "overwrite_if_exists=False" flag seems much nicer.

Aside from the argument name being too verbose, that violates the rule of thumb "avoid constant bool flags" design principle.

...

Verbose is better than cryptic. Having the exact same name as something that does something else is pretty cryptic.

It really isn't. That's why we have namespaces (packages, modules, classes) so we don't have to use unique names for everything. It's not "cryptic" that Paris, France is difference from Paris, Texas. Let's not abuse the word "cryptic" for things which are easy to resolve. This is cryptic: https://britishlibrary.typepad.co.uk/digitisedmanuscripts/2015/08/help-us-de... not having two symlink functions in different modules.

...

Then a different name seems also to be in order. symlink/setsymlink seems a lot better to me for example.

"set" adds nothing to the function name that isn't already implied. You might as well just prefix it with any random three characters: symlink # sets a symlink gwqsymlink # sets a symlink Among those who don't read the docs, all you are doing is giving them three extra characters to type when they post to Stackoverflow: "What's the difference between os.symlink and shutil.[set]symlink?" Arguably "forcesymlink" or "force_symlink" would be better, but given that option, I'd flip my preference to a force parameter: +1 shutil.symlink +0 os.symlink with force=False parameter -0.5 shutil.force_symlink -1 shutil.setsymlink -- Steven

Anders Hovmöller

4:50 a.m.

...

On 14 May 2019, at 04:24, Steven D'Aprano <steve@pearwood.info> wrote:

*wink*. Function names are mnemonics, not documentation.

Certainly not with that attitude. But it easily could be. Maybe you would be fine with a language where all function names are just hex values? *wink* (yes I think this was rude but that's my point).

...

mnemonics

Agreed, and like keys in a dict they work badly if there's a collision. / Anders

Rhodri James

10:55 a.m.

On 14/05/2019 05:50, Anders Hovmöller wrote:

...

...
On 14 May 2019, at 04:24, Steven D'Aprano <steve@pearwood.info> wrote:

*wink*. Function names are mnemonics, not documentation.

Certainly not with that attitude. But it easily could be. Maybe you would be fine with a language where all function names are just hex values? *wink* (yes I think this was rude but that's my point).

I'm an assembler programmer. Function names *are* just hex values :-) -- Rhodri James *-* Kynesim Ltd

Richard Damon

11:56 a.m.

New subject: [SPAM?] Re: shutil.symlink to allow non-race replacement of existing link targets

On 5/14/19 6:55 AM, Rhodri James wrote:

...

On 14/05/2019 05:50, Anders Hovmöller wrote:

...
...
On 14 May 2019, at 04:24, Steven D'Aprano <steve@pearwood.info> wrote:

*wink*. Function names are mnemonics, not documentation.

Certainly not with that attitude. But it easily could be. Maybe you would be fine with a language where all function names are just hex values? *wink* (yes I think this was rude but that's my point).

I'm an assembler programmer. Function names *are* just hex values :-)

Actually, even to an assembler programmer, most Function names are symbolic labels. Yes, you CAN write a call to an absolute (or relative) memory address, but that is unusual unless you are writing an in memory patch to an already established program. Do you really write something like: call 0x32453 very often? -- Richard Damon

Anders Hovmöller

5:02 a.m.

...

On 14 May 2019, at 04:24, Steven D'Aprano <steve@pearwood.info> wrote:

It's not "cryptic" that Paris, France is difference from Paris, Texas. Let's not abuse the word "cryptic" for things which are easy to resolve.

I'm sorry I thought you were the one arguing FOR using the same word for two very different things? *wink* The real argument is that it's cryptic because the reader of the code won't know they need to resolve it. That's the pitfall. / Anders

Tom Hale

2:04 p.m.

New subject: shutil.symlink - "avoid constant bool flags"

On 14/5/19 12:38 am, Steven D'Aprano wrote:

...

On Mon, May 13, 2019 at 12:31:08PM +0200, Anders Hovmöller wrote:

...
An optional "overwrite_if_exists=False" flag seems much nicer.

Aside from the argument name being too verbose, that violates the rule of thumb "avoid constant bool flags" design principle.

(Note that this is a *rule of thumb*, not a hard law: there are times that it can and should be violated; its also not a well-known principle and so lots of APIs violate it even when they shouldn't.)

Interestingly, reviewing https://docs.python.org/3/library/os.html, I see *many* optional arguments with a Boolean default value. In fact, both os.link and os.symlink have one. os.link(src, dst, *, src_dir_fd=None, dst_dir_fd=None, follow_symlinks=True) os.symlink(src, dst, target_is_directory=False, *, dir_fd=None) The title of the page https://docs.python.org/library/os.html is: Miscellaneous operating system interfaces It is interesting that a module which provides interfaces seems to so blatantly disregard the "avoid constant bool flags" function interface design principle. Given that they are interfaces, they could have been written in a more pythonesque way. If the POSIX utility ln were to be implemented, there are a few binary options: 1. Hard or soft link 2. Overwrite or raise an error 3. Follow symlinks or link to them If there were a different function for each, that would be a total of 8 functions, with ugly names like "hardlink_overwrite_follow_links" or "softlink_non_overwrite_link_to_links". It seems far more practicable to have only two functions with sensible boolean defaults, with the split being based on the underlying os module function, namely os.link and os.symlink. -- Tom Hale

Greg Ewing

11:10 p.m.

New subject: shutil.symlink - "avoid constant bool flags"

Tom Hale wrote:

...

It seems far more practicable to have only two functions with sensible boolean defaults, with the split being based on the underlying os module function, namely os.link and os.symlink.

Also the os module is designed to follow the C API of the OS as closely as practicable, so that users have a minimal amount to learn. -- Greg

Steven D'Aprano

6:12 p.m.

On Mon, May 13, 2019 at 04:38:08PM +0700, Tom Hale wrote:

...

As suggested by Toshio Kuratomi at https://bugs.python.org/issue36656, I am raising this here for inclusion in the shutil module.

Mimicking POSIX, os.symlink() will raise FileExistsError if the link name to be created already exists.

Seems reasonable. I presume that's the current behaviour.

...

A common use case is overwriting an existing file (often a symlink) with a symlink. Naively, one would delete the file named link_name file if it exists, then call symlink(). This "solution" is already 3 lines of code, and without exception handling it introduces the race condition of a file named link_name being created between unlink and symlink.

Depending on the functionality required, I suggest:

* os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file

...

One issue I see with my suggested code is that the file at temp_link_name could be changed before target is replaced with it. This is mitigated by the randomness introduced by mktemp().

Arguing over the fine details of implementation should probably be done on the bug tracker, between the implementor and whoever is reviewing the code. As far as I'm concerned, the proposed interface is a good one: - os.symlink for the low level function - shutil.symlink for a high-level wrapper that handles the common case of wanting to overwrite an existing file. -- Steven

Serhiy Storchaka

11:43 a.m.

13.05.19 12:38, Tom Hale пише:

...

As suggested by Toshio Kuratomi at https://bugs.python.org/issue36656, I am raising this here for inclusion in the shutil module.

Mimicking POSIX, os.symlink() will raise FileExistsError if the link name to be created already exists.

A common use case is overwriting an existing file (often a symlink) with a symlink. Naively, one would delete the file named link_name file if it exists, then call symlink(). This "solution" is already 3 lines of code, and without exception handling it introduces the race condition of a file named link_name being created between unlink and symlink.

Depending on the functionality required, I suggest:

* os.symlink() - the new link name is expected to NOT exist * shutil.symlink() - the new symlink replaces an existing file

Sorry, but I do not understand what problem do you try to solve. If somebody can create a file named link_name between unlink and symlink, he can also remove and create a file named link_name after symlink.

Steven D'Aprano

4:50 p.m.

On Tue, May 14, 2019 at 02:43:03PM +0300, Serhiy Storchaka wrote:

...

Sorry, but I do not understand what problem do you try to solve. If somebody can create a file named link_name between unlink and symlink, he can also remove and create a file named link_name after symlink.

I don't think that is always correct, although I don't know if it makes a difference to your point or to the risk of this (supposed) race condition. On posix systems, you should be able to use chattr +i to make the file immutable, so that the attacker cannot remove or replace it. Normally only root has the ability to do this, but other users can be granted that capability. (I'm not sure how easy it is to call chattr from Python.) There may also be SELinux controls in place. I don't use SELinux myself so I don't know what precisely. On Windows, permissions are quite different and far more fine-grained than on posix, so I think that there could be scenarios were the attacker can create link_name between the unlink and symlink (the feared race condition) but not delete and replace link_name after it is in place. I'm not sure how relevant these observations are. But I think you make a good point that we need to understand precisely the nature of the problem being solved before we decide how to solve it :-) -- Steven

eryk sun

11:18 p.m.

On 5/14/19, Steven D'Aprano <steve@pearwood.info> wrote:

...

On posix systems, you should be able to use chattr +i to make the file immutable, so that the attacker cannot remove or replace it.

Minor point of clarification. File attributes, and APIs to access them, are not in the POSIX standard. chattr is a Linux command that wraps the filesystem IOCTLs for getting and setting file attributes. There's no chattr system call, so thus far it's not supported in Python's os module. BSD and macOS have chflags, which supports both system- and user-immutable file attributes. Python supports it as os.chflags.

Serhiy Storchaka

7:02 a.m.

14.05.19 19:50, Steven D'Aprano пише:

...

On Tue, May 14, 2019 at 02:43:03PM +0300, Serhiy Storchaka wrote:

...
Sorry, but I do not understand what problem do you try to solve. If somebody can create a file named link_name between unlink and symlink, he can also remove and create a file named link_name after symlink.

I don't think that is always correct, although I don't know if it makes a difference to your point or to the risk of this (supposed) race condition.

On posix systems, you should be able to use chattr +i to make the file immutable, so that the attacker cannot remove or replace it. Normally only root has the ability to do this, but other users can be granted that capability. (I'm not sure how easy it is to call chattr from Python.)

There may also be SELinux controls in place. I don't use SELinux myself so I don't know what precisely.

This looks like two marginal case for including it in the stdlib. Python does not provide API for "chattr +i", so you should use a subprocess for creating an immutable temporary symlink before replacing the target. It would be not easy to test this feature because you need to grant specific capabilities to the Python interpreter. This may be an interesting project on PyPi, but I do not think that we need to include it in the stdlib. Because of little need and because of complex errorprone implementation.

Barry Scott

8:28 a.m.

To replace one symlink with another atomically is possible by using rename() or renameat() something like: os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst ) Use dir_fd's to taste. I'm sure there is a lot more to cover all the corner cases and attack vectors. I'm not sure how immutable file will help with this. Barry

Serhiy Storchaka

10:05 a.m.

16.05.19 11:28, Barry Scott пише:

...

To replace one symlink with another atomically is possible by using rename() or renameat() something like:

os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

Barry Scott

11:08 a.m.

...

On 16 May 2019, at 11:05, Serhiy Storchaka <storchaka@gmail.com> wrote:

16.05.19 11:28, Barry Scott пише:

...
To replace one symlink with another atomically is possible by using rename() or renameat() something like: os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

As I said "I'm sure there is a lot more to cover all the corner cases and attack vectors." I did this: $ ln -s -f foo bar $ strace ln -s -f foo bar and long story short it does the rename from a tmp named symlink. Having a shutil function that does the same logic as GNU coreutils ln -s -f would seem to be a nice addition. However if you have a situation where security is a concern then there is a lot of design work that needs to be done that is surely outside the scope the stdlib as its application specific? Barry

Antoine Pitrou

11:33 a.m.

On Thu, 16 May 2019 13:05:48 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

16.05.19 11:28, Barry Scott пише:

...
To replace one symlink with another atomically is possible by using rename() or renameat() something like:

os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

Not if tmp_symlink is created in a directory with strict permissions, I guess. Regards Antoine.

Serhiy Storchaka

1:04 p.m.

16.05.19 14:33, Antoine Pitrou пише:

...

On Thu, 16 May 2019 13:05:48 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
16.05.19 11:28, Barry Scott пише:

...
To replace one symlink with another atomically is possible by using rename() or renameat() something like:

os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

Not if tmp_symlink is created in a directory with strict permissions, I guess.

But in such case we do not need complex games with a temporary symlink. Just use os.symlink() and os.unlink() if needed.

Antoine Pitrou

2:05 p.m.

On Thu, 16 May 2019 16:04:36 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

16.05.19 14:33, Antoine Pitrou пише:

...
On Thu, 16 May 2019 13:05:48 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
16.05.19 11:28, Barry Scott пише:

...
To replace one symlink with another atomically is possible by using rename() or renameat() something like:

os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

Not if tmp_symlink is created in a directory with strict permissions, I guess.

But in such case we do not need complex games with a temporary symlink. Just use os.symlink() and os.unlink() if needed.

I was talking about the *tmp_symlink*, not *dst*.

Serhiy Storchaka

3:05 p.m.

16.05.19 17:05, Antoine Pitrou пише:

...

On Thu, 16 May 2019 16:04:36 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
16.05.19 14:33, Antoine Pitrou пише:

...
On Thu, 16 May 2019 13:05:48 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
16.05.19 11:28, Barry Scott пише:

...
To replace one symlink with another atomically is possible by using rename() or renameat() something like:

os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

Not if tmp_symlink is created in a directory with strict permissions, I guess.

But in such case we do not need complex games with a temporary symlink. Just use os.symlink() and os.unlink() if needed.

I was talking about the *tmp_symlink*, not *dst*.

They both should be on the same file system. The simplest way to achieve this is to create tmp_symlink in the same directory as dst.

Antoine Pitrou

3:22 p.m.

On Thu, 16 May 2019 18:05:39 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...

16.05.19 17:05, Antoine Pitrou пише:

...
On Thu, 16 May 2019 16:04:36 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
16.05.19 14:33, Antoine Pitrou пише:

...
On Thu, 16 May 2019 13:05:48 +0300 Serhiy Storchaka <storchaka@gmail.com> wrote:

...
16.05.19 11:28, Barry Scott пише:

...
To replace one symlink with another atomically is possible by using rename() or renameat() something like:

os.symlink( src, tmp_symlink ) os.rename( tmp_symlink, dst )

Somebody can replace tmp_symlink between os.symlink() and os.rename().

Not if tmp_symlink is created in a directory with strict permissions, I guess.

But in such case we do not need complex games with a temporary symlink. Just use os.symlink() and os.unlink() if needed.

I was talking about the *tmp_symlink*, not *dst*.

They both should be on the same file system. The simplest way to achieve this is to create tmp_symlink in the same directory as dst.

So what?

Tom Hale

2:04 p.m.

Let me prefix all of this by saying that a shutil.symlink_with_overwrite is useful (see below), but limited in scope. Please see my next post on emulating POSIX ln. On 16/5/19 2:02 pm, Serhiy Storchaka wrote:

...

This may be an interesting project on PyPi, but I do not think that we need to include it in the stdlib. Because of little need and because of complex errorprone implementation.

There are two points here. 1) Need. 2) Implementation difficulty. 1. Need The reason I believe that this code should be available in a standard library is that it's unreasonable to expect that the average user who wishes to replace a symlink (or hard link for that matter) to work out the possible ways in which the operation could go wrong, and then correctly prevent those non-obvious failure cases. Yes, the likelihood of link_name being created (or not being removed) between unlink and (sym)link is *extremely* low, but it is NOT zero. Also, the likelihood of a needed symlink being accessed between unlink and link is tiny. Given that the the first case likelihood is so low, most users won't bother to roll their own robust solution, and instead just lazily cross their fingers and hope. Even if they do try to avoid the first race mentioned above, the most obvious solution still contains the 2nd race, as shown in my OP. There is a known non-trivial solution for a known problem which is a type of problem that many people would not think about. In my opinion, shielding users from needing to know and work through all of this is worthwhile and responsible. Indeed, it may be part of a duty of care, following the "no surprises" rule of thumb. 2. Difficulty of implementation Increased difficulty of implementation would mean that the code is more likely be implemented poorly/incorrectly on a roll-your-own basis. To me, I see this as adding weight to the case of inclusion in a library. -- Tom Hale

Tom Hale

2:13 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

Thanks to all who have contributed to the discussion so far. I've noticed that the documentation[1] for both os.link and os.symlink doesn't mention that the "dst" filename should not exist. Also omitted from the documentation of both is any mention of "FileExistsError". Taking a step back, I believe that it would be far more useful for shutil to implement POSIX ln[2] as closely as possible via shutil.link and shutil.symlink. I propose the following behaviour: ============================================================================== shutil.link: * Create hard links, name follows os.link shutil.symlink: * Create symlinks, name follows os.symlink * Takes target_is_directory=False parameter as does os.symlink (Windows has different symlinks depending on the type of the target) Common: * Takes overwrite=False parameter. Updates links atomically. * Takes follow_symlinks=False parameter (as does os.link) * Takes target_or_targets as 1st argument * Behaviour depends on whether 2nd argument is directory (like POSIX ln) * If 2nd argument (destination) is a directory * Multiple targets are allowed * The basename of each target will be linked inside directory * If 2nd argument (destination) is a file * Only one target is allowed Examples: shutil.link(file1, file2, overwrite=True) shutil.symlink(list_of_files, dir_name) =========================================================================== I also propose the following to be added to the documentation for os.(sym)link: Raises FileExistsError if dst exists. Use shutil.(sym)link to atomically overwrite an existing (sym)link. For os.symlink, the dir_fd=None parameter is undocumented. Add analogous documentation as per os.link. =========================================================================== If this proposal has a positive reception (I may even need guidance in determining that), would anyone be willing to mentor me through the code submission process? I have read https://devguide.python.org/stdlibchanges/ but would like to be able to have someone to review and bounce ideas off given this will be my first stdlib contribution. If someone is willing to mentor, please contact me via private mail. [1] https://docs.python.org/library/os.html [2] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ln.html -- Tom Hale

Tom Hale

3:07 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

They say silence is golden... But I'm still looking for some feedback on the below. Cheers! -- Tom Hale On 16 May 2019 21:13:50 Tom Hale <tom@hale.ee> wrote:

...

Thanks to all who have contributed to the discussion so far.

I've noticed that the documentation[1] for both os.link and os.symlink doesn't mention that the "dst" filename should not exist. Also omitted from the documentation of both is any mention of "FileExistsError".

Taking a step back, I believe that it would be far more useful for shutil to implement POSIX ln[2] as closely as possible via shutil.link and shutil.symlink.

I propose the following behaviour:

==============================================================================

shutil.link: * Create hard links, name follows os.link

shutil.symlink: * Create symlinks, name follows os.symlink * Takes target_is_directory=False parameter as does os.symlink (Windows has different symlinks depending on the type of the target)

Common: * Takes overwrite=False parameter. Updates links atomically. * Takes follow_symlinks=False parameter (as does os.link) * Takes target_or_targets as 1st argument * Behaviour depends on whether 2nd argument is directory (like POSIX ln) * If 2nd argument (destination) is a directory * Multiple targets are allowed * The basename of each target will be linked inside directory * If 2nd argument (destination) is a file * Only one target is allowed

Examples:

shutil.link(file1, file2, overwrite=True) shutil.symlink(list_of_files, dir_name)

===========================================================================

I also propose the following to be added to the documentation for os.(sym)link:

Raises FileExistsError if dst exists. Use shutil.(sym)link to atomically overwrite an existing (sym)link.

For os.symlink, the dir_fd=None parameter is undocumented. Add analogous documentation as per os.link.

===========================================================================

If this proposal has a positive reception (I may even need guidance in determining that), would anyone be willing to mentor me through the code submission process?

I have read https://devguide.python.org/stdlibchanges/ but would like to be able to have someone to review and bounce ideas off given this will be my first stdlib contribution.

If someone is willing to mentor, please contact me via private mail.

[1] https://docs.python.org/library/os.html [2] https://pubs.opengroup.org/onlinepubs/9699919799/utilities/ln.html

-- Tom Hale

Steven D'Aprano

3:48 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

On Wed, May 29, 2019 at 10:07:38PM +0700, Tom Hale wrote:

...

They say silence is golden... But I'm still looking for some feedback on the below.

I think the first thing you need to do is respond to Serhiy's objection: Sorry, but I do not understand what problem do you try to solve. If somebody can create a file named link_name between unlink and symlink, he can also remove and create a file named link_name after symlink. If Serhiy is correct, that completely undermines the primary motive for adding this functionality. -- Steven

Barry

9:22 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

...

On 29 May 2019, at 16:48, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Wed, May 29, 2019 at 10:07:38PM +0700, Tom Hale wrote: They say silence is golden... But I'm still looking for some feedback on the below.

I think the first thing you need to do is respond to Serhiy's objection:

Sorry, but I do not understand what problem do you try to solve. If somebody can create a file named link_name between unlink and symlink, he can also remove and create a file named link_name after symlink.

If Serhiy is correct, that completely undermines the primary motive for adding this functionality.

Serhiy, I think, is conflating two things. 1. How to write software robust aginst attack. 2. How to replace a symlink atomically. The only reason 1 is a problem is that the application is not in control of its file space which I would suggest means you already lost. I think the OP wants a soution to 2, which as I suggest can leverage the design work in the implementation of the GNU ln -sf command in python. Barry

...

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

12:49 a.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

On Wed, May 29, 2019 at 10:22:31PM +0100, Barry wrote:

...

Serhiy, I think, is conflating two things. 1. How to write software robust aginst attack. 2. How to replace a symlink atomically.

I don't have an opinion on whether Serhiy is right or wrong.

...

The only reason 1 is a problem is that the application is not in control of its file space which I would suggest means you already lost.

I think the OP wants a soution to 2,

In the bug report Tom linked to initially: https://bugs.python.org/issue36656 he clearly references an attacker overwriting the file, rather than an accidental overwrite. So I think #1 is relevant -- except that Serhiy makes a good point that atomicity doesn't solve #1. I admit to a preference for atomic file operations where possible. Atomic operations are generally better, because they either succeed or fail, they don't half succeed and leave detritus lying around that you have to clean up. That's a good thing. No offense intended to Serhiy... I don't know why the concept is so controversial. I feel that had a core dev just gone ahead and implemented this behaviour either in shutils or os.symlink, nobody would have objected and asked for it to be removed. Its only because Tom has to (a) ask permission and (b) discuss the design first that are we having the debate. Making things atomic may not be a solution to every problem, but nor is it harmful and something we want to avoid. I can understand Serhiy saying "I don't care to implement this myself, and I won't review the PR, let somebody else do it" but I'm not sure why he is objecting to a volunteer willing to do the job. But since Serhiy has objected, Tom has to respond to those objections. Serhiy is one of the most productive and most respected of the core devs, and right or wrong he cannot be ignored. -- Steven

Barry Scott

7:47 a.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

...

On 30 May 2019, at 01:49, Steven D'Aprano <steve@pearwood.info> wrote:

On Wed, May 29, 2019 at 10:22:31PM +0100, Barry wrote:

...
Serhiy, I think, is conflating two things. 1. How to write software robust aginst attack. 2. How to replace a symlink atomically.

I don't have an opinion on whether Serhiy is right or wrong.

...
The only reason 1 is a problem is that the application is not in control of its file space which I would suggest means you already lost.

I think the OP wants a soution to 2,

In the bug report Tom linked to initially:

https://bugs.python.org/issue36656

he clearly references an attacker overwriting the file, rather than an accidental overwrite. So I think #1 is relevant -- except that Serhiy makes a good point that atomicity doesn't solve #1.

I admit to a preference for atomic file operations where possible. Atomic operations are generally better, because they either succeed or fail, they don't half succeed and leave detritus lying around that you have to clean up. That's a good thing.

No offense intended to Serhiy... I don't know why the concept is so controversial. I feel that had a core dev just gone ahead and implemented this behaviour either in shutils or os.symlink, nobody would have objected and asked for it to be removed. Its only because Tom has to (a) ask permission and (b) discuss the design first that are we having the debate.

Making things atomic may not be a solution to every problem, but nor is it harmful and something we want to avoid. I can understand Serhiy saying "I don't care to implement this myself, and I won't review the PR, let somebody else do it" but I'm not sure why he is objecting to a volunteer willing to do the job.

But since Serhiy has objected, Tom has to respond to those objections. Serhiy is one of the most productive and most respected of the core devs, and right or wrong he cannot be ignored.

Sorry, I should have checked back on the OP report, not relied on my memory. Serhiy is right that atomic rename will not fix all security problems. Designing secure software takes a lot of analysis. As you say the atomic rename is valuable in its own right. I'd not put it in os as a implementation based on ln -sf would not be trivial and I think better fits in shutil. Barry

...

-- Steven _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Serhiy Storchaka

June 2019

7:06 a.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

30.05.19 00:22, Barry пише:

...

Serhiy, I think, is conflating two things. 1. How to write software robust aginst attack. 2. How to replace a symlink atomically.

Why do you need to replace a symlink atomically? This is a solution, what problem it solves?

Tom Hale

1:08 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

On 1/6/19 2:06 pm, Serhiy Storchaka wrote:

...

30.05.19 00:22, Barry пише:

...
Serhiy, I think, is conflating two things. 1. How to write software robust aginst attack. 2. How to replace a symlink atomically.

Why do you need to replace a symlink atomically? This is a solution, what problem it solves?

Note that the issue is not limited to symlinks, the analogous problem occurs with hard links. Atomicity is generally considered to be a Good Thing. Atomicity eliminates race conditions. Race conditions are sometimes difficult to enumerate, but here are three I can think of which would not occur given an atomic replace: 1) Unnecessarily Inconsistent Post-conditions 2) Where the link must always exist 3) Unhandled exception attacks 1) Unnecessarily Inconsistent Post-conditions ============================================== Atomicity ensures that an operation has a consistent post-condition where that is possible. Greg did a great job of explaining this: On 1/6/19 2:29 pm, Greg Ewing wrote:

...

Process A wants to symlink f1 --> f2, replacing any existing f1.

Process B wants to create f1 if it doesn't already exist, or update it if it does. If f1 is a symlink, the file it's linked to should be updated.

The end result should be that f1 exists and is a symlink to f2.

If the symlink is not atomic, this can happen:

1. Process A sees that f1 already exists and deletes it. 2. Process B sees that f1 does not exist and creates a new file called f1. 3. Process A tries to symlink f1 to f2, which fails because there is now an existing file called f1.

This violates the postcondition, because f1 is not a symlink to f2.

2) Where the link must always exist ==================================== This is an example of atomicity ensuring that a condition remains valid *during* the operation. I gave an example of this in my initial post: On 13/5/19 4:38 pm, Tom Hale wrote:

...

It would be tempting to do:

while True: try: os.symlink(target, link_name) break except FileExistsError: os.remove(link_name)

But this has a race condition when replacing a symlink should should *always* exist, eg:

/lib/critical.so -> /lib/critical.so.1.2

When upgrading by:

symlink('/lib/critical.so.2.0', '/lib/critical.so')

There is a point in time when /lib/critical.so doesn't exist.

The way I know to ensure that the well-known symlink exists at all times is to replace it. 3) Unhandled exception attacks =============================== Most people replacing a link or symlink will naively just unlink the existing link if it exists, then create the replacement. Most people won't think that an exception could occur if somehow the destination is recreated after unlink and before link. Possible DoS: Someone who doesn't have permissions to kill a process, but has write access to a link or symlink could infinite-loop trying to create a new file at the location of the (sym)link. While this would be high CPU load, if there was a signal that a link replacement was imminent (eg the creation of a lockfile) this would be possible to hide. Alternatively, if the link updates were known to occur within a very small time window (eg cron job), this attack also could be feasible. Other objections ================= On 16/5/19 5:05 pm, Serhiy Storchaka wrote:

...

Somebody can replace tmp_symlink between os.symlink() and os.rename().

I raised this in my initial post: On 13/5/19 4:38 pm, Tom Hale wrote:

...

One issue I see with my suggested code is that the file at temp_link_name could be changed before target is replaced with it. This is mitigated by the randomness introduced by mktemp().

While it is far less likely that a file is accessed with a random and unknown name than with an existing known name, I seek input on a solution if this is an unacceptable risk.

My solution reduces risk greatly. I am still open to suggestions to totally eliminate it. Wrap-up ======== It's easier to program when one doesn't constantly need to be aware of edge and corner cases, instead having them handled by the standard library. POSIX dictates a certain behaviour for link() and symlink(), but we have the opportunity to make life easier for programmers via shutil. Atomicity is a venerable problem in computing science. We have a solution which massively reduces the risk of race conditions. The larger picture =================== Atomicity is only ONE point proposed in this post: https://code.activestate.com/lists/python-ideas/56054/ The linked posts steps back and proposes to support POSIX's ln interface: ie, allowing multiple links to be created in a directory with a single invocation. Atomicity is optional in this discussion, but so far, there has been no discussion. In that post I also propose documentation updates (also no response). Concerning the bug report ========================== Please note that I have changed the bug: https://bugs.python.org/issue36656 To no longer refer be classed as "security". If I could change it to reference shutil rather than os, I would. -- Regards, Tom Hale

Inada Naoki

7:25 a.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

On Sat, Jun 1, 2019 at 4:10 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

...

Why do you need to replace a symlink atomically? This is a solution, what problem it solves?

There is another, more common / realistic usage of atomic symlink replacing. When deploy PHP application or static web contents to web servers, symlink is used for "atomic deployment". Roughly speaking, it is like this: ``` rsync -avK v2/ web:contents/v2/ ssh web "cd contents && ln -sf v2 current" # current/ is exposed by web server ``` If "ln -sf" is not atomic and do remove & symlink, web server or php will return 404 error between remove and symlink. I feel this use case is more real world application than "I don't want exception even when other process creates file at the same time". These are some links referring about "atomic deployment": * https://github.com/deployphp/deployer/blob/master/recipe/deploy/symlink.php * https://hackernoon.com/truly-atomic-deployments-with-nginx-and-php-fpm-aed8a... * https://blog.forrest79.net/?p=537 Then, should Python support it? Maybe. Python is used tools like Ansible. While "atomic deploy" is widely used in PHP community, it's welcome that PHPer use Python for tools. So I think it would be nice if shutil.symlink can do same thing `ln -sf` can do. On the other hand, I am not sure we should support "make tmp, rename it to target, or remove tmp if failed" idiom natively everywhere in shutil and pathlib. When writing file or copying file into directly which is exposed by WEB server, this idiom is to avoid exposing partial files to users. But implementing this idiom in everywhere in shutil and pathlib is still controversial. In case of symlink, I'm weak +1 to implement this idiom directly in stdlib. tempfile.mkstemp doesn't support symlink. So people need to use external command `ln -sf` for now. Regards, -- Inada Naoki <songofacandy@gmail.com>

Serhiy Storchaka

7:58 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

04.06.19 10:25, Inada Naoki пише:

...

On Sat, Jun 1, 2019 at 4:10 PM Serhiy Storchaka <storchaka@gmail.com> wrote:

...
Why do you need to replace a symlink atomically? This is a solution, what problem it solves?

There is another, more common / realistic usage of atomic symlink replacing.

When deploy PHP application or static web contents to web servers, symlink is used for "atomic deployment". Roughly speaking, it is like this:

``` rsync -avK v2/ web:contents/v2/ ssh web "cd contents && ln -sf v2 current" # current/ is exposed by web server ```

If "ln -sf" is not atomic and do remove & symlink, web server or php will return 404 error between remove and symlink.

I feel this use case is more real world application than "I don't want exception even when other process creates file at the same time".

Thank you, Inada-san, your example convinced me. Now I agree that it is worth to add shutil.symlink (and maybe shutil.link) which supports the "ln -sf" behavior.

Tom Hale

2:08 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

Based on the below positive feedback, I've created a PR here: https://github.com/python/cpython/pull/14464 Only shutil.symlink is currently implemented. Feedback (and possibly fixes) requested from Windows users. Thanks to all for their ideas, input and constructive criticism. Cheers, -- Tom Hale On 26/6/19 2:58 am, Serhiy Storchaka wrote:

...

04.06.19 10:25, Inada Naoki пише:

...
If "ln -sf" is not atomic and do remove & symlink, web server or php will return 404 error between remove and symlink.

I feel this use case is more real world application than "I don't want exception even when other process creates file at the same time".

Thank you, Inada-san, your example convinced me. Now I agree that it is worth to add shutil.symlink (and maybe shutil.link) which supports the "ln -sf" behavior.

Barry Scott

6:06 p.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

...

On 29 Jun 2019, at 15:08, Tom Hale <tom@hale.ee> wrote:

Based on the below positive feedback, I've created a PR here:

https://github.com/python/cpython/pull/14464

Only shutil.symlink is currently implemented. Feedback (and possibly fixes) requested from Windows users.

Thanks to all for their ideas, input and constructive criticism.

I think you need to use the dirfd version of all file operations to make sure that a symlink in the dir path is not changed why you are doing the multiple file operations. For example in shutil.py:1391 needs to create the tempfile using a dirfd. At shutil.py:1398 can't use use finally: for the cleanup? Barry

...

Cheers,

-- Tom Hale

On 26/6/19 2:58 am, Serhiy Storchaka wrote:

...
04.06.19 10:25, Inada Naoki пише:

...
If "ln -sf" is not atomic and do remove & symlink, web server or php will return 404 error between remove and symlink.

I feel this use case is more real world application than "I don't want exception even when other process creates file at the same time". Thank you, Inada-san, your example convinced me. Now I agree that it is worth to add shutil.symlink (and maybe shutil.link) which supports the "ln -sf" behavior.

Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/5YC546... Code of Conduct: http://python.org/psf/codeofconduct/

Greg Ewing

7:29 a.m.

New subject: Implement POSIX ln via shutil.link and shutil.symlink

...

On Wed, May 29, 2019 at 10:07:38PM +0700, Tom Hale wrote:

If somebody can create a file named link_name between unlink and symlink, he can also remove and create a file named link_name after symlink.

I tbink there are some corner cases that can give different results if the symlink is not atomic. For example: Process A wants to symlink f1 --> f2, replacing any existing f1. Process B wants to create f1 if it doesn't already exist, or update it if it does. If f1 is a symlink, the file it's linked to should be updated. The end result should be that f1 exists and is a symlink to f2. If the symlink is not atomic, this can happen: 1. Process A sees that f1 already exists and deletes it. 2. Process B sees that f1 does not exist and creates a new file called f1. 3. Process A tries to symlink f1 to f2, which fails because there is now an existing file called f1. This violates the postcondition, because f1 is not a symlink to f2. -- Greg

2066

Age (days ago)

2114

Last active (days ago)

List overview

Download

49 comments

14 participants

participants (14)

Anders Hovmöller
Antoine Pitrou
Barry
Barry Scott
eryk sun
Greg Ewing
Inada Naoki
Rhodri James
Richard Damon
Serge Matveenko
Serhiy Storchaka
Stephen J. Turnbull
Steven D'Aprano
Tom Hale

shutil.symlink to allow non-race replacement of existing link targets

tags

participants (14)