[Python-ideas] shutil.symlink to allow non-race replacement of existing link targets

Steven D'Aprano steve at pearwood.info
Mon May 13 13:38:03 EDT 2019

On Mon, May 13, 2019 at 12:31:08PM +0200, Anders Hovmöller wrote:
> > On 13 May 2019, at 11:38, Tom Hale <tom at hale.ee> wrote:
> > 
> > * os.symlink() - the new link name is expected to NOT exist
> > * shutil.symlink() - the new symlink replaces an existing file
> This seems error prone to me. There is nothing about those names that 
> hint at the difference in behavior. 

One of them is in the `os` module, and therefore we should expect that 
it will be a thin wrapper around the OS functionality.

The other is in the `shutil` module, and therefore we should expect that 
it will be a shell utility function of arbitrary complexity.

Being found in different modules should always be treated as a sign that 
the functions may be different. (If they're the same, why do we need 

> An optional "overwrite_if_exists=False" flag seems much nicer.

Aside from the argument name being too verbose, that violates the rule 
of thumb "avoid constant bool flags" design principle.

(Note that this is a *rule of thumb*, not a hard law: there are times 
that it can and should be violated; its also not a well-known principle 
and so lots of APIs violate it even when they shouldn't.)

If you have a function which takes a boolean argument to swap between 
two different modes, and the usual calling pattern is to pass a 

    function(arg, x, y, flag=True)

rather than a flag which is not know until runtime, then it is (usually) 
better to use two distinct functions rather than one function with a 
parameter that swaps modes.

Another way to put it: in general, if you have two modes, you should 
have two functions.

For example:

- str.find and rfind, not str.find with a from_right parameter;

- bisect.insort_left and insort_right, not bisect.insort with a
  from_left parameter;

- statistics.stdev and pstdev rather than population parameter;

- os.setuid and setgid rather than a change_group parameter;

- zip and itertools.zip_longest, not zip with longest parameter.

This principle doesn't apply when the flag is typically not known until 
runtime. For example, this is unfortunate:

    if get_some_flag(*args):
        result = str.find(spam)
        result = str.rfind(spam)

but rare. If it were very common, then it would be justified to provide 
a single API with a mode flag switching behaviours.

The idea of "no constant bool arguments" is that if you typically 
know which mode you want at edit-time (that includes runtime in 
interactive environments), the two modes should be distinguished by 
function name rather than by a parameter.

I expect that, typically, users of this will know ahead of time whether 
they want to overwrite symlinks or not.


More information about the Python-ideas mailing list