File system path PEP, part 2
Second draft that takes Guido's comments into consideration. The biggest
change is os.fspath() now returns whatever path.__fspath__() returns
instead of restricting it to only str.
Minor changes:
- Renamed the C function to PyOS_FSPath()
- Added an Implementation section with a TODO list
- Bunch of things added to the Rejected Ideas section
----------
PEP: NNN
Title: Adding a file system path protocol
Version: $Revision$
Last-Modified: $Date$
Author: Brett Cannon
On 13 May 2016 at 06:53, Brett Cannon
Second draft that takes Guido's comments into consideration. The biggest change is os.fspath() now returns whatever path.__fspath__() returns instead of restricting it to only str.
Minor changes: - Renamed the C function to PyOS_FSPath() - Added an Implementation section with a TODO list - Bunch of things added to the Rejected Ideas section
+1 for this version from me, as it means we have: - os.fsencode(obj) as the coerce-to-bytes API - os.fspath(obj) as the str/bytes hybrid API - os.fsdecode(obj) as the coerce-to-str API - os.fspath(pathlib.PurePath(obj)) as the error-on-bytes API That more strongly nudges people towards "use pathlib if you want to ensure cross-platform friendly path handling", which is an outcome I'm fine with. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On Thu, May 12, 2016 at 08:53:12PM +0000, Brett Cannon wrote:
Second draft that takes Guido's comments into consideration. The biggest change is os.fspath() now returns whatever path.__fspath__() returns instead of restricting it to only str.
Counter suggestion: - __fspath__() method may return either bytes or str (no change from the PEP as it stands now); - but os.fspath() will only return str; - and os.fspathb() will only return bytes; - there is no os function that returns "str or bytes, I don't care which". (If you really need that, call __fspath__ directly.) Note that this differs from the already rejected suggestion that there should be two dunder methods, __fspath__() and __fspathb__(). Why? (1) Normally, the caller knows whether they want str or bytes. (That's been my experience, you may disagree.) If so, and they call os.fspath() expecting a str, they won't be surprised by it returning bytes. And visa versa for when you expect a bytes path. (2) This behaviour will match that of os.{environ[b],getcwd[b],getenv[b]}. Cons: (3) Polymorphic code that truly doesn't care whether it gets bytes or str will have a slightly less convenient way of getting it, namely by calling __fspath__() itself, instead of os.fspath(). A few other comments below:
builtins ''''''''
``open()`` [#builtins-open]_ will be updated to accept path objects as well as continue to accept ``str`` and ``bytes``.
I think it is a bit confusing to refer to "path objects", as that seems like you are referring only to pathlib.Path objects. It took me far too long to realise that here you mean generic path-like objects that obey the __fspath__ protocol rather than a specific concrete class. Since the ABC is called "PathLike", I suggest we refer to "path-like objects" rather than "path objects", both in the PEP and in the Python docs for this protocol.
def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str, bytes]: """Return the string representation of the path.
If str or bytes is passed in, it is returned unchanged. """
I've already suggested a change to this, above, but independent of that, a minor technical query:
try: return path.__fspath__()
Would I be right in saying that in practice this will actually end up being type(path).__fspath__() to match the behaviour of all(?) other dunder methods? -- Steve
On Fri, May 13, 2016 at 9:00 PM, Steven D'Aprano
Cons: (3) Polymorphic code that truly doesn't care whether it gets bytes or str will have a slightly less convenient way of getting it, namely by calling __fspath__() itself, instead of os.fspath().
I don't like this; it goes against the general principle that dunders are for defining, not calling. Generally, a given dunder method has approximately one call site, eg __reduce__ in pickle.py, and everyone else defines it. (You might call super's dunder in the definition of your own, but that's still defining it, not calling it.) Having an official statement that it's appropriate to call a dunder confuses this. So this isn't a "slightly less convenient way", it's a bad way (IMO), and this is a very strong con. ChrisA
On 13 May 2016 at 21:00, Steven D'Aprano
On Thu, May 12, 2016 at 08:53:12PM +0000, Brett Cannon wrote:
Second draft that takes Guido's comments into consideration. The biggest change is os.fspath() now returns whatever path.__fspath__() returns instead of restricting it to only str.
Counter suggestion:
- __fspath__() method may return either bytes or str (no change from the PEP as it stands now);
- but os.fspath() will only return str;
- and os.fspathb() will only return bytes;
We don't have any use cases for a bytes-only API - only str-or-bytes (in the polymorphic low level functions) and str-only (in newer high level APIs like pathlib). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
- there is no os function that returns "str or bytes, I don't care which". (If you really need that, call __fspath__ directly.)
os.fspath() in the PEP works when given str or bytes directly, but those don't have a __fspath__ method, so directly calling the dunder method is not equivalent to calling os.fspath(). The whole point of having os.fspath() is to pass str or paths (or bytes) and get an appropriate object for lew-level functions.
On Fri, May 13, 2016, at 07:00, Steven D'Aprano wrote:
- but os.fspath() will only return str;
- and os.fspathb() will only return bytes;
And raise an exception if __fspath__ returns the other, I suppose. What's the use case for these functions? When would I call them rather than fsdecode and fsencode? (assuming the latter will support the path protocol - I've got more objections if they're not going to) Also, what happens if you pass a string to os.fspath? Statements like "str will not implement __fspath__" have thus far been light on details of what functions will accept "str or path-like-object-whose-__fspath__returns-str" and which ones will only accept the latter (and what, if any, will continue to only accept the former). If no-one's supposed to directly call dunder methods, and os.fspath accepts str, then what difference does it make whether this is implemented by having a special case within os.fspath or by calling str.__fspath__?
On Fri, 13 May 2016 at 04:00 Steven D'Aprano
On Thu, May 12, 2016 at 08:53:12PM +0000, Brett Cannon wrote:
Second draft that takes Guido's comments into consideration. The biggest change is os.fspath() now returns whatever path.__fspath__() returns instead of restricting it to only str.
Counter suggestion:
- __fspath__() method may return either bytes or str (no change from the PEP as it stands now);
- but os.fspath() will only return str;
- and os.fspathb() will only return bytes;
- there is no os function that returns "str or bytes, I don't care which". (If you really need that, call __fspath__ directly.)
Note that this differs from the already rejected suggestion that there should be two dunder methods, __fspath__() and __fspathb__().
Why?
(1) Normally, the caller knows whether they want str or bytes. (That's been my experience, you may disagree.) If so, and they call os.fspath() expecting a str, they won't be surprised by it returning bytes. And visa versa for when you expect a bytes path.
(2) This behaviour will match that of os.{environ[b],getcwd[b],getenv[b]}.
Cons:
(3) Polymorphic code that truly doesn't care whether it gets bytes or str will have a slightly less convenient way of getting it, namely by calling __fspath__() itself, instead of os.fspath().
I prefer what's in the PEP. I get where you coming from, Steven, but I don't think it will be common enough to worry about. Think of os.fspath() like next() where it truly is a very minor convenience function that happens to special-case str and bytes.
A few other comments below:
builtins ''''''''
``open()`` [#builtins-open]_ will be updated to accept path objects as well as continue to accept ``str`` and ``bytes``.
I think it is a bit confusing to refer to "path objects", as that seems like you are referring only to pathlib.Path objects. It took me far too long to realise that here you mean generic path-like objects that obey the __fspath__ protocol rather than a specific concrete class.
Since the ABC is called "PathLike", I suggest we refer to "path-like objects" rather than "path objects", both in the PEP and in the Python docs for this protocol.
I went back and forth with this in my head while writing the PEP. The problem with making "path-like" mean "objects implementing the PathLike ABC" becomes how do you refer to an argument of a function that accepts anything os.fspath() does (i.e. PathLike, str, and bytes)?
def fspath(path: t.Union[PathLike, str, bytes]) -> t.Union[str,
bytes]:
"""Return the string representation of the path.
If str or bytes is passed in, it is returned unchanged. """
I've already suggested a change to this, above, but independent of that, a minor technical query:
try: return path.__fspath__()
Would I be right in saying that in practice this will actually end up being type(path).__fspath__() to match the behaviour of all(?) other dunder methods?
I wasn't planning on it because for most types the accessing of the method directly off of the type for magic methods is because of some special struct field at the C level that we're pulling from. Since we're not planning to have an equivalent struct field I don't see any need to do the extra work of avoiding the instance participating in method lookup. Obviously if people disagree for some reason then please let me know (maybe for perf by avoiding the overhead of checking for the method on the instance?).
On 05/13/2016 08:43 AM, Brett Cannon wrote:
a minor technical query:
try: return path.__fspath__()
Would I be right in saying that in practice this will actually end up being type(path).__fspath__() to match the behaviour of all(?) other dunder methods?
I wasn't planning on it because for most types the accessing of the method directly off of the type for magic methods is because of some special struct field at the C level that we're pulling from. Since we're not planning to have an equivalent struct field I don't see any need to do the extra work of avoiding the instance participating in method lookup. Obviously if people disagree for some reason then please let me know (maybe for perf by avoiding the overhead of checking for the method on the instance?).
I would say use `type(x).__fspath__`. I'm not aware of any other __dunder__ method that doesn't access the attribute from the type instead of the instance, and I see no point in making this one different. I know there's a line in the Zen about foolish conistencies, but I suspect there's a corollary about foolish inconsistencies. ;) -- ~Ethan~
On Sat, May 14, 2016 at 2:34 AM, Ethan Furman
I would say use `type(x).__fspath__`. I'm not aware of any other __dunder__ method that doesn't access the attribute from the type instead of the instance, and I see no point in making this one different.
__reduce__ / __reduce_ex__ in pickle.py is accessed with a straight-forward getattr() call. It's the ones that are called from deep within the interpreter core (eg __iter__) that are always looked up on the type. ChrisA
On Fri, May 13, 2016 at 9:34 AM, Ethan Furman
I would say use `type(x).__fspath__`. I'm not aware of any other __dunder__ method that doesn't access the attribute from the type instead of the instance, and I see no point in making this one different.
Agreed. -- --Guido van Rossum (python.org/~guido)
On Fri, May 13, 2016 at 03:43:29PM +0000, Brett Cannon wrote:
On Fri, 13 May 2016 at 04:00 Steven D'Aprano
wrote:
[...]
- but os.fspath() will only return str; - and os.fspathb() will only return bytes;
I prefer what's in the PEP. I get where you coming from, Steven, but I don't think it will be common enough to worry about. Think of os.fspath() like next() where it truly is a very minor convenience function that happens to special-case str and bytes.
Okay, I'm satisfied by the various arguments against this idea. [...]
I think it is a bit confusing to refer to "path objects", as that seems like you are referring only to pathlib.Path objects. It took me far too long to realise that here you mean generic path-like objects that obey the __fspath__ protocol rather than a specific concrete class.
Since the ABC is called "PathLike", I suggest we refer to "path-like objects" rather than "path objects", both in the PEP and in the Python docs for this protocol.
I went back and forth with this in my head while writing the PEP. The problem with making "path-like" mean "objects implementing the PathLike ABC" becomes how do you refer to an argument of a function that accepts anything os.fspath() does (i.e. PathLike, str, and bytes)?
On further reflection, I think the right language is to use "path-like" for the union of Pathlike, str and bytes. That will, I think, cover the majority of cases: most functions which want to work on a file system path should accept all three. When you want to specify only an object which implements the PathLike ABC, that's called a (virtual) instance of PathLike. I see this as analogous to "iterable". An iterable is anything which can be iterated over. Sometimes that's an iterator. Sometimes its not an iterator. We don't have a special term for "iterable which is not an iterator", mostly because its rare to care about the distinction, but on those rare cases that we do care, we can describe it explicitly. In this case, "path-like" would be equivalent to iterable: anything that can be used as a file system path. Some path-like objects are actual pathlib.Path objects, some are some other PathLike object, and some are strings or bytes. [...]
try: return path.__fspath__()
Would I be right in saying that in practice this will actually end up being type(path).__fspath__() to match the behaviour of all(?) other dunder methods?
I wasn't planning on it because for most types the accessing of the method directly off of the type for magic methods is because of some special struct field at the C level that we're pulling from. Since we're not planning to have an equivalent struct field I don't see any need to do the extra work of avoiding the instance participating in method lookup. Obviously if people disagree for some reason then please let me know (maybe for perf by avoiding the overhead of checking for the method on the instance?).
The reasons I would disagree are: (1) It took me a long time to learn the rule that dunder methods are always called from the class, not the instance, and now I have to learn that there are exceptions? Grrrr argggh. (2) If we ever do change to a C struct field, the behaviour will change. Maybe it's better to emulate the same behaviour from the start? (3) If there's a performance speed up, that's a bonus! -- Steven
On Fri, 13 May 2016 at 10:53 Steven D'Aprano
On Fri, May 13, 2016 at 03:43:29PM +0000, Brett Cannon wrote:
On Fri, 13 May 2016 at 04:00 Steven D'Aprano
wrote: [...]
- but os.fspath() will only return str; - and os.fspathb() will only return bytes;
I prefer what's in the PEP. I get where you coming from, Steven, but I don't think it will be common enough to worry about. Think of os.fspath() like next() where it truly is a very minor convenience function that happens to special-case str and bytes.
Okay, I'm satisfied by the various arguments against this idea.
I'll add a note in the Rejected Ideas section about this.
[...]
I think it is a bit confusing to refer to "path objects", as that seems like you are referring only to pathlib.Path objects. It took me far too long to realise that here you mean generic path-like objects that obey the __fspath__ protocol rather than a specific concrete class.
Since the ABC is called "PathLike", I suggest we refer to "path-like objects" rather than "path objects", both in the PEP and in the Python docs for this protocol.
I went back and forth with this in my head while writing the PEP. The problem with making "path-like" mean "objects implementing the PathLike ABC" becomes how do you refer to an argument of a function that accepts anything os.fspath() does (i.e. PathLike, str, and bytes)?
On further reflection, I think the right language is to use "path-like" for the union of Pathlike, str and bytes. That will, I think, cover the majority of cases: most functions which want to work on a file system path should accept all three. When you want to specify only an object which implements the PathLike ABC, that's called a (virtual) instance of PathLike.
I see this as analogous to "iterable". An iterable is anything which can be iterated over. Sometimes that's an iterator. Sometimes its not an iterator. We don't have a special term for "iterable which is not an iterator", mostly because its rare to care about the distinction, but on those rare cases that we do care, we can describe it explicitly.
In this case, "path-like" would be equivalent to iterable: anything that can be used as a file system path. Some path-like objects are actual pathlib.Path objects, some are some other PathLike object, and some are strings or bytes.
That was my general thinking. I'll add a TODO to add an entry in the glossary for "path-like".
try: return path.__fspath__()
Would I be right in saying that in practice this will actually end up being type(path).__fspath__() to match the behaviour of all(?) other dunder methods?
I wasn't planning on it because for most types the accessing of the method directly off of the type for magic methods is because of some special struct field at the C level that we're pulling from. Since we're not planning to have an equivalent struct field I don't see any need to do
[...] the
extra work of avoiding the instance participating in method lookup. Obviously if people disagree for some reason then please let me know (maybe for perf by avoiding the overhead of checking for the method on the instance?).
The reasons I would disagree are:
(1) It took me a long time to learn the rule that dunder methods are always called from the class, not the instance, and now I have to learn that there are exceptions? Grrrr argggh.
(2) If we ever do change to a C struct field, the behaviour will change. Maybe it's better to emulate the same behaviour from the start?
(3) If there's a performance speed up, that's a bonus!
And Guido agrees with this as well so I'll be updating the PEP shortly after I remember how to do the equivalent from C. :)
On Fri, May 13, 2016 at 8:52 PM, Steven D'Aprano
On Fri, May 13, 2016 at 03:43:29PM +0000, Brett Cannon wrote:
On Fri, 13 May 2016 at 04:00 Steven D'Aprano
wrote: [...]
I think it is a bit confusing to refer to "path objects", as that seems like you are referring only to pathlib.Path objects. It took me far too long to realise that here you mean generic path-like objects that obey the __fspath__ protocol rather than a specific concrete class.
This terminology is indeed a bit difficult, not least because there are 6 different path classes in pathlib. A couple of months ago, I decided to start to call these pathlib objects and path objects, because I did not know what else to call them without doubling the length.
Since the ABC is called "PathLike", I suggest we refer to "path-like objects" rather than "path objects", both in the PEP and in the Python docs for this protocol.
I went back and forth with this in my head while writing the PEP. The problem with making "path-like" mean "objects implementing the PathLike ABC" becomes how do you refer to an argument of a function that accepts anything os.fspath() does (i.e. PathLike, str, and bytes)?
On further reflection, I think the right language is to use "path-like" for the union of Pathlike, str and bytes. That will, I think, cover the majority of cases: most functions which want to work on a file system path should accept all three. When you want to specify only an object which implements the PathLike ABC, that's called a (virtual) instance of PathLike.
As I've told Brett before, I think exactly this reasoning would be a good enough reason not to call the ABC PathLike. [...]
Would I be right in saying that in practice this will actually end up being type(path).__fspath__() to match the behaviour of all(?) other dunder methods?
I wasn't planning on it because for most types the accessing of the method directly off of the type for magic methods is because of some special struct field at the C level that we're pulling from. Since we're not planning to have an equivalent struct field I don't see any need to do the extra work of avoiding the instance participating in method lookup. Obviously if people disagree for some reason then please let me know (maybe for perf by avoiding the overhead of checking for the method on the instance?).
The reasons I would disagree are:
(1) It took me a long time to learn the rule that dunder methods are always called from the class, not the instance, and now I have to learn that there are exceptions? Grrrr argggh.
Who knows, maybe the exception made it take a longer time.. IIRC the docs say that dunder methods are *not guaranteed* to be called on the instance, because they may be called on the class.
(2) If we ever do change to a C struct field, the behaviour will change. Maybe it's better to emulate the same behaviour from the start?
(3) If there's a performance speed up, that's a bonus!
-- Steven _______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/k7hoven%40gmail.com
On Fri, May 13, 2016 at 4:00 AM, Steven D'Aprano
On Thu, May 12, 2016 at 08:53:12PM +0000, Brett Cannon wrote:
Second draft that takes Guido's comments into consideration. The biggest change is os.fspath() now returns whatever path.__fspath__() returns instead of restricting it to only str.
Counter suggestion:
- __fspath__() method may return either bytes or str (no change from the PEP as it stands now);
- but os.fspath() will only return str;
- and os.fspathb() will only return bytes;
- there is no os function that returns "str or bytes, I don't care which". (If you really need that, call __fspath__ directly.)
Note that this differs from the already rejected suggestion that there should be two dunder methods, __fspath__() and __fspathb__().
Why?
(1) Normally, the caller knows whether they want str or bytes. (That's been my experience, you may disagree.) If so, and they call os.fspath() expecting a str, they won't be surprised by it returning bytes. And visa versa for when you expect a bytes path.
(2) This behaviour will match that of os.{environ[b],getcwd[b],getenv[b]}.
Cons:
(3) Polymorphic code that truly doesn't care whether it gets bytes or str will have a slightly less convenient way of getting it, namely by calling __fspath__() itself, instead of os.fspath().
More cons: - It would be confusing that there'd be no direct corresponding between os.fspath(x) and x.__fspath__(), unlike for most other dunders (next(x) -> x.__next__(), iter(x) -> x.__iter__(), and so on). - Your examples like os.getcwd() have no input to determine whether to return bytes or str, so for those we use an alternate name to request bytes. However for most os methods the same method handles both, e.g. os.listdir(b'.') will return a list of bytes objects while os.listdir('.') will return a list of strings. os.fspath() firmly falls in the latter camp: it takes an input whose __fspath__() method will return either str or bytes, so that's all the guidance it needs. Really, if you want bytes, you should use os.fsencode(); if you want strings, use os.fsencode(); if you want to be polymorphic, use os.fspath() and check the type it returns. I find the case where you'd explicitly want to exclude bytes unusual; Nick already proposed pathlib.Path(os.fspath(x)) for that. -- --Guido van Rossum (python.org/~guido)
On 05/13/2016 06:21 PM, Guido van Rossum wrote:
Really, if you want bytes, you should use os.fsencode(); if you want strings, use os.fsencode(); if you want to be polymorphic, use os.fspath() and check the type it returns.
Am I severely misunderstanding the API, or did you mean "if you want strings, use os.fsdecode()" here? //arry/
On Fri, May 13, 2016 at 9:33 AM, Larry Hastings
On 05/13/2016 06:21 PM, Guido van Rossum wrote:
Really, if you want bytes, you should use os.fsencode(); if you want strings, use os.fsencode(); if you want to be polymorphic, use os.fspath() and check the type it returns.
Am I severely misunderstanding the API, or did you mean "if you want strings, use os.fsdecode()" here?
encode, decode -- poh-tay-toe, poh-tah-toe. :-) Another slip of the fingers, I did mean os.fsdecode() when you want strings. encode, schmencode... -- --Guido van Rossum (python.org/~guido)
On Fri, May 13, 2016 at 2:00 PM, Steven D'Aprano
Counter suggestion:
- __fspath__() method may return either bytes or str (no change from the PEP as it stands now);
- but os.fspath() will only return str;
- and os.fspathb() will only return bytes;
- there is no os function that returns "str or bytes, I don't care which". (If you really need that, call __fspath__ directly.)
Note that this differs from the already rejected suggestion that there should be two dunder methods, __fspath__() and __fspathb__().
Why?
(1) Normally, the caller knows whether they want str or bytes. (That's been my experience, you may disagree.) If so, and they call os.fspath() expecting a str, they won't be surprised by it returning bytes. And visa versa for when you expect a bytes path.
(2) This behaviour will match that of os.{environ[b],getcwd[b],getenv[b]}.
I would think these have the b suffix because there is no good way to infer which type should be returned. In things like os.path.join or os.path.dirname you pass in the object(s) that determine the return type. In os.fspath, you pass in an object, whose type (str/bytes) or "underlying path string type" (as returned by __fspath__()) determines the return type of fspath. I think this is well in line with os.path functions. -- Koos
Cons:
(3) Polymorphic code that truly doesn't care whether it gets bytes or str will have a slightly less convenient way of getting it, namely by calling __fspath__() itself, instead of os.fspath().
Thanks Brett!
Now one thing is that, despite your suggestion, I had not added myself
as an author in my big pull request. Originally, it was because I
simply forgot to copy and paste it when I split my edits into separate
commits ;-). Sorry about that (not sure if you care though, and I've
been defending the PEP regardless).
Anyway, especially now that my main worry regarding the open questions
has been resolved, I would be more than happy to have my name on it.
So Brett, could you add me as author? (Koos Zevenhoven and
k7hoven@gmail.com will be fine)
It looks like this is finally happening :)
-- Koos
On Thu, May 12, 2016 at 11:53 PM, Brett Cannon
Second draft that takes Guido's comments into consideration. The biggest change is os.fspath() now returns whatever path.__fspath__() returns instead of restricting it to only str.
Minor changes: - Renamed the C function to PyOS_FSPath() - Added an Implementation section with a TODO list - Bunch of things added to the Rejected Ideas section
participants (10)
-
Brett Cannon
-
Chris Angelico
-
Ethan Furman
-
Guido van Rossum
-
Joseph Martinot-Lagarde
-
Koos Zevenhoven
-
Larry Hastings
-
Nick Coghlan
-
Random832
-
Steven D'Aprano