[Python-ideas] Three ways of paths canonization

Guido van Rossum guido at python.org
Thu Sep 8 12:45:36 EDT 2016

I would prefer it if Path.resolve() resolved symlinks until it hits
something that doesn't exist and then just keep the rest of the path
unchanged. I think this is the equivalent of -m in the mentioned
utility (which I had never heard of).

It looks like os.path.realpath() already works this way.

Steve Dower just mentioned to me that the Windows version of
Path.resolve() uses a Windows API that opens the file and then asks
the system for the pathname -- that doesn't work if the file doesn't
exist, so it should be fixed to back off and try the same thing on the
parent, etc.

We should treat these things as bugs to fix before 3.6 (in 3.6b1 if
possible) and make pathlib non-provisional as of 3.6.0.Probably these
things should be fixed in 3.5.3 as well, since pathlib is still
provisional there.


On Wed, Sep 7, 2016 at 7:47 AM, Stephen J. Turnbull
<turnbull.stephen.fw at u.tsukuba.ac.jp> wrote:
> Serhiy Storchaka writes:
>  > The readlink utility from GNU coreutils has three mode for resolving
>  > file path:
>  >
>  >         -f, --canonicalize
>  >                canonicalize by following every symlink in every
>  > component of the given name recursively; all but the last component must
>  > exist
>  >
>  >         -e, --canonicalize-existing
>  >                canonicalize by following every symlink in every
>  > component of the given name recursively, all components must exist
>  >
>  >         -m, --canonicalize-missing
>  >                canonicalize by following every symlink in every
>  > component of the given name recursively, without requirements on
>  > components existence
> In Mac OS X (and I suppose other BSDs), realpath(3) implements -e.
> glibc does none of these, instead:
>    GNU extensions
>        If the call fails with either EACCES or ENOENT and
>        resolved_path is not NULL, then the prefix of path that is not
>        readable or does not exist is returned in resolved_path.
> I suppose this nonstandard behavior is controlled by a #define, but
> the Linux manpage doesn't specify it.
>  > Current behavior of posixpath.realpath() is matches (besides one minor
>  > detail) to `readlink -m`. The behavior of Path.resolve() matches
>  > `readlink -e`.
> This looks like a bug in posixpath, while Path.resolve follows POSIX.
> http://pubs.opengroup.org/onlinepubs/009695399/functions/realpath.html
> sez:
>     Upon successful completion, realpath() shall return a pointer to
>     the resolved name. Otherwise, realpath() shall return a null
>     pointer and set errno to indicate the error, and the contents of
>     the buffer pointed to by resolved_name are undefined.
>     ERRORS
>     The realpath() function shall fail if:
> [...]
>     [ENOENT] A component of file_name does not name an existing file or
>         file_name points to an empty string.
>     [ENOTDIR] A component of the path prefix is not a directory.
> which corresponds to -e.
>  > I have proposed a patch that adds three-state optional parameter to
>  > posixpath.realpath() and I'm going to provide similar patch for
>  > Path.resolve(). But I'm not sure this is good API. Are there better
>  > variants?
> Said parameter will almost always be a constant.  Usually in those
> cases Python prefers to use different functions.  Eg,
>     posixpath.realpath                    -e
>     posixpath.realpath_require_prefix     -f
>     posixpath.realpath_allow_missing      -m
>     posixpath.realpath_gnuext             GNU extension
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/

--Guido van Rossum (python.org/~guido)

More information about the Python-ideas mailing list