Before removing provisional state from the pathlib module, we should resolve the issue with Path.resolve(). It corresponds to os.path.realpath(), but behaves differently in case of non-existent path. Actually we can't say that any of these functions is wrong. Both behaviors make sense in different situations. The readlink utility from GNU coreutils has three mode for resolving file path: -f, --canonicalize canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist -e, --canonicalize-existing canonicalize by following every symlink in every component of the given name recursively, all components must exist -m, --canonicalize-missing canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence Current behavior of posixpath.realpath() is matches (besides one minor detail) to `readlink -m`. The behavior of Path.resolve() matches `readlink -e`. I have proposed a patch that adds three-state optional parameter to posixpath.realpath() and I'm going to provide similar patch for Path.resolve(). But I'm not sure this is good API. Are there better variants? [1] http://bugs.python.org/issue19717 [2] http://bugs.python.org/issue27002
Serhiy Storchaka writes:
The readlink utility from GNU coreutils has three mode for resolving file path:
-f, --canonicalize canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
-e, --canonicalize-existing canonicalize by following every symlink in every component of the given name recursively, all components must exist
-m, --canonicalize-missing canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence
In Mac OS X (and I suppose other BSDs), realpath(3) implements -e. glibc does none of these, instead: GNU extensions If the call fails with either EACCES or ENOENT and resolved_path is not NULL, then the prefix of path that is not readable or does not exist is returned in resolved_path. I suppose this nonstandard behavior is controlled by a #define, but the Linux manpage doesn't specify it.
Current behavior of posixpath.realpath() is matches (besides one minor detail) to `readlink -m`. The behavior of Path.resolve() matches `readlink -e`.
This looks like a bug in posixpath, while Path.resolve follows POSIX. http://pubs.opengroup.org/onlinepubs/009695399/functions/realpath.html sez: RETURN VALUE Upon successful completion, realpath() shall return a pointer to the resolved name. Otherwise, realpath() shall return a null pointer and set errno to indicate the error, and the contents of the buffer pointed to by resolved_name are undefined. ERRORS The realpath() function shall fail if: [...] [ENOENT] A component of file_name does not name an existing file or file_name points to an empty string. [ENOTDIR] A component of the path prefix is not a directory. which corresponds to -e.
I have proposed a patch that adds three-state optional parameter to posixpath.realpath() and I'm going to provide similar patch for Path.resolve(). But I'm not sure this is good API. Are there better variants?
Said parameter will almost always be a constant. Usually in those cases Python prefers to use different functions. Eg, posixpath.realpath -e posixpath.realpath_require_prefix -f posixpath.realpath_allow_missing -m posixpath.realpath_gnuext GNU extension
I would prefer it if Path.resolve() resolved symlinks until it hits something that doesn't exist and then just keep the rest of the path unchanged. I think this is the equivalent of -m in the mentioned utility (which I had never heard of). It looks like os.path.realpath() already works this way. Steve Dower just mentioned to me that the Windows version of Path.resolve() uses a Windows API that opens the file and then asks the system for the pathname -- that doesn't work if the file doesn't exist, so it should be fixed to back off and try the same thing on the parent, etc. We should treat these things as bugs to fix before 3.6 (in 3.6b1 if possible) and make pathlib non-provisional as of 3.6.0.Probably these things should be fixed in 3.5.3 as well, since pathlib is still provisional there. --Guido On Wed, Sep 7, 2016 at 7:47 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Serhiy Storchaka writes:
The readlink utility from GNU coreutils has three mode for resolving file path:
-f, --canonicalize canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
-e, --canonicalize-existing canonicalize by following every symlink in every component of the given name recursively, all components must exist
-m, --canonicalize-missing canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence
In Mac OS X (and I suppose other BSDs), realpath(3) implements -e. glibc does none of these, instead:
GNU extensions If the call fails with either EACCES or ENOENT and resolved_path is not NULL, then the prefix of path that is not readable or does not exist is returned in resolved_path.
I suppose this nonstandard behavior is controlled by a #define, but the Linux manpage doesn't specify it.
Current behavior of posixpath.realpath() is matches (besides one minor detail) to `readlink -m`. The behavior of Path.resolve() matches `readlink -e`.
This looks like a bug in posixpath, while Path.resolve follows POSIX. http://pubs.opengroup.org/onlinepubs/009695399/functions/realpath.html sez:
RETURN VALUE
Upon successful completion, realpath() shall return a pointer to the resolved name. Otherwise, realpath() shall return a null pointer and set errno to indicate the error, and the contents of the buffer pointed to by resolved_name are undefined.
ERRORS
The realpath() function shall fail if:
[...] [ENOENT] A component of file_name does not name an existing file or file_name points to an empty string. [ENOTDIR] A component of the path prefix is not a directory.
which corresponds to -e.
I have proposed a patch that adds three-state optional parameter to posixpath.realpath() and I'm going to provide similar patch for Path.resolve(). But I'm not sure this is good API. Are there better variants?
Said parameter will almost always be a constant. Usually in those cases Python prefers to use different functions. Eg,
posixpath.realpath -e posixpath.realpath_require_prefix -f posixpath.realpath_allow_missing -m posixpath.realpath_gnuext GNU extension
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
On Thu, 8 Sep 2016 at 09:46 Guido van Rossum <guido@python.org> wrote:
I would prefer it if Path.resolve() resolved symlinks until it hits something that doesn't exist and then just keep the rest of the path unchanged. I think this is the equivalent of -m in the mentioned utility (which I had never heard of).
It looks like os.path.realpath() already works this way.
Steve Dower just mentioned to me that the Windows version of Path.resolve() uses a Windows API that opens the file and then asks the system for the pathname -- that doesn't work if the file doesn't exist, so it should be fixed to back off and try the same thing on the parent, etc.
We should treat these things as bugs to fix before 3.6 (in 3.6b1 if possible) and make pathlib non-provisional as of 3.6.0.Probably these things should be fixed in 3.5.3 as well, since pathlib is still provisional there.
http://bugs.python.org/issue28031 to track this. -Brett
--Guido
On Wed, Sep 7, 2016 at 7:47 AM, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Serhiy Storchaka writes:
The readlink utility from GNU coreutils has three mode for resolving file path:
-f, --canonicalize canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
-e, --canonicalize-existing canonicalize by following every symlink in every component of the given name recursively, all components must exist
-m, --canonicalize-missing canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence
In Mac OS X (and I suppose other BSDs), realpath(3) implements -e. glibc does none of these, instead:
GNU extensions If the call fails with either EACCES or ENOENT and resolved_path is not NULL, then the prefix of path that is not readable or does not exist is returned in resolved_path.
I suppose this nonstandard behavior is controlled by a #define, but the Linux manpage doesn't specify it.
Current behavior of posixpath.realpath() is matches (besides one minor detail) to `readlink -m`. The behavior of Path.resolve() matches `readlink -e`.
This looks like a bug in posixpath, while Path.resolve follows POSIX. http://pubs.opengroup.org/onlinepubs/009695399/functions/realpath.html sez:
RETURN VALUE
Upon successful completion, realpath() shall return a pointer to the resolved name. Otherwise, realpath() shall return a null pointer and set errno to indicate the error, and the contents of the buffer pointed to by resolved_name are undefined.
ERRORS
The realpath() function shall fail if:
[...] [ENOENT] A component of file_name does not name an existing file or file_name points to an empty string. [ENOTDIR] A component of the path prefix is not a directory.
which corresponds to -e.
I have proposed a patch that adds three-state optional parameter to posixpath.realpath() and I'm going to provide similar patch for Path.resolve(). But I'm not sure this is good API. Are there better variants?
Said parameter will almost always be a constant. Usually in those cases Python prefers to use different functions. Eg,
posixpath.realpath -e posixpath.realpath_require_prefix -f posixpath.realpath_allow_missing -m posixpath.realpath_gnuext GNU extension
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Wed, Sep 7, 2016 at 12:20 PM, Serhiy Storchaka <storchaka@gmail.com> wrote:
Before removing provisional state from the pathlib module, we should resolve the issue with Path.resolve(). It corresponds to os.path.realpath(), but behaves differently in case of non-existent path. Actually we can't say that any of these functions is wrong. Both behaviors make sense in different situations.
Oh, almost missed this email. I won't be able to look into the issue you bring up at the moment. However, to be honest, I keep coming back to the thought that pathlib should have another provisional cycle. As the new fspath protocol (PEP 519) will be released in 3.6, the experimentation with pathlib has still been very limited. Since pathlib was not compatible with the rest of the stdlib, the limited experimentation that was done previously is not even completely valid, because PEP 519 makes the situation a little different. And there are some rough edges there, which make some things a little awkward. Removing the provisional status now might lead to the module becoming a mere burden, as third-party variations become significantly better. -- Koos
The readlink utility from GNU coreutils has three mode for resolving file path:
-f, --canonicalize canonicalize by following every symlink in every component of the given name recursively; all but the last component must exist
-e, --canonicalize-existing canonicalize by following every symlink in every component of the given name recursively, all components must exist
-m, --canonicalize-missing canonicalize by following every symlink in every component of the given name recursively, without requirements on components existence
Current behavior of posixpath.realpath() is matches (besides one minor detail) to `readlink -m`. The behavior of Path.resolve() matches `readlink -e`.
I have proposed a patch that adds three-state optional parameter to posixpath.realpath() and I'm going to provide similar patch for Path.resolve(). But I'm not sure this is good API. Are there better variants?
[1] http://bugs.python.org/issue19717 [2] http://bugs.python.org/issue27002
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- + Koos Zevenhoven + http://twitter.com/k7hoven +
participants (5)
-
Brett Cannon
-
Guido van Rossum
-
Koos Zevenhoven
-
Serhiy Storchaka
-
Stephen J. Turnbull