Dunder method to make object str-like
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better. Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str. This could supersede the __fspath__ "give me the string for this path" protocol, or could stand in parallel with it. Obviously str will have this dunder method, returning self. Most other core types (notably 'object') will not define it. Absence of this method implies that the object cannot be treated as a string. String methods will be defined as accepting string-like objects. For instance, "hello"+foo will succeed if foo is string-like. Downside: Two string-like objects may behave unexpectedly - foo+bar will concatenate strings if either is an actual string, but if both are other string-like objects, depends on the implementation of those objects. Bikeshedding: 1) What should the dunder method be named? __str_coerce__? __stringlike__? 2) Which standard types are sufficiently string-like to be used thus? 3) Should there be a bytes equivalent? 4) Should there be a format string "coerce to str"? "{}".format(x) is equivalent to str(x), but it might be nice to be able to assert that something's stringish already. Open season!
![](https://secure.gravatar.com/avatar/512cfbaf98d63ca4acd57b2df792aec6.jpg?s=120&d=mm&r=g)
On Thu, Apr 07, 2016 at 11:04:56PM +1000, Chris Angelico <rosuav@gmail.com> wrote:
1) What should the dunder method be named? __str_coerce__? __stringlike__?
__tostring__ Oleg. -- Oleg Broytman http://phdru.name/ phd@phdru.name Programmers don't die, they just GOSUB without RETURN.
![](https://secure.gravatar.com/avatar/c0d6734f46d03eee0b0a740f91c451cf.jpg?s=120&d=mm&r=g)
+1 for me. On Thu, Apr 07, 2016 at 11:04:56PM +1000, Chris Angelico <rosuav@gmail.com> wrote:
1) What should the dunder method be named? __str_coerce__? __stringlike__?
The equivalent for ints is __index__, which is one subset of the vast possibilities of integers, yet one that will mostly be served by int-like objects. By that reasoning, the str-like method should correspond to a large - yet wide - use case. The first use case for strings that comes to mind is dict (namespace) keys. I'd rule out __keys__ as not explicit enough. I was thinking __item__, but then again probably not explicit enough. I don't have "the obvious choice" in mind, but I'd like to be one word only.
2) Which standard types are sufficiently string-like to be used thus?
No built-in types, but I'd like the feature for a third-party app, for example, that pretends to be a string.
3) Should there be a bytes equivalent?
This would probably fall in the "nice to have, but no use case backing it up" category. +0 for me.
4) Should there be a format string "coerce to str"? "{}".format(x) is equivalent to str(x), but it might be nice to be able to assert that something's stringish already.
__index__ is meant to return the same thing as __int__, and I think the same restrictions should apply here - we have an object pretending to be a string, so there should *not* be a difference between use_as_string(obj) and use_as_string(str(obj)) - in the same sense that there should not be a difference between use_as_int(obj) and use_as_int(int(obj)). -Emanuel
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 11:46 PM, Émanuel Barry <vgr255@live.ca> wrote:
4) Should there be a format string "coerce to str"? "{}".format(x) is equivalent to str(x), but it might be nice to be able to assert that something's stringish already.
__index__ is meant to return the same thing as __int__, and I think the same restrictions should apply here - we have an object pretending to be a string, so there should *not* be a difference between use_as_string(obj) and use_as_string(str(obj)) - in the same sense that there should not be a difference between use_as_int(obj) and use_as_int(int(obj)).
Absolutely agreed; however, I was thinking of the same restriction: "{!must_be_str}".format(x) would raise an exception if use_as_str(x) raises. But if it succeeds, yes, it's the same as simple str() formatting. ChrisA
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 06:46 AM, Émanuel Barry wrote:
__index__ is meant to return the same thing as __int__, and I think the same restrictions should apply here
Do you mean "the same thing" as in both return 'int', or "the same thing" is in __int__(3.4) == __index__(3.4) ? Because that last one is False.
- we have an object pretending to be a string, so there should *not* be a difference between use_as_string(obj) and use_as_string(str(obj))
Hmmm -- so maybe you are saying that if the results of __str__ and __tostring__ are the same we're fine, just like if the results of __int__ and __index__ are the same we're fine? Otherwise an error is raised. Sounds reasonable. -- ~Ethan~
![](https://secure.gravatar.com/avatar/c0d6734f46d03eee0b0a740f91c451cf.jpg?s=120&d=mm&r=g)
From: Ethan Furman Sent: Thursday, April 07, 2016 11:01 AM To: python-ideas@python.org Subject: Re: [Python-ideas] Dunder method to make object str-like
On 04/07/2016 06:46 AM, Émanuel Barry wrote:
__index__ is meant to return the same thing as __int__, and I think the same restrictions should apply here
Do you mean "the same thing" as in both return 'int', or "the same thing" is in __int__(3.4) == __index__(3.4) ? Because that last one is False.
I badly phrased that, let me try again. In the cases that __index__ is defined (and does not raise), it should return the same thing as __int__.
Hmmm -- so maybe you are saying that if the results of __str__ and __tostring__ are the same we're fine, just like if the results of __int__ and __index__ are the same we're fine? Otherwise an error is raised.
Pretty much; as above, if __tostring__ (in absence of a better name) is defined and does not raise, it should return the same as __str__. -Emanuel
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 16:05, Émanuel Barry <vgr255@live.ca> wrote:
Hmmm -- so maybe you are saying that if the results of __str__ and __tostring__ are the same we're fine, just like if the results of __int__ and __index__ are the same we're fine? Otherwise an error is raised.
Pretty much; as above, if __tostring__ (in absence of a better name) is defined and does not raise, it should return the same as __str__.
OK, that makes sense as a definition of "what __tostring__ does". But I'm struggling to think of an example of code that would legitimately want to *use* it (unlike __index__ where the obvious motivating case was indexing a sequence). And even with a motivating example of use, I don't know that I can think of an example of something other than a string that might *provide* the method. The problem with the "ASCII byte string" example is that if a type provides __tostring__ I'd expect it to work for all values of that type. I'm not clear if "ASCII byte string" is intended to mean "a value of type bytes that only contains ASCII characters", or "a type that subclasses bytes to refuse to allow non-ASCII". The former would imply __tostring__ working sometimes, but not always. The latter seems like a contrived example (although I'm open to someone explaining a real-world use case where it's important). Paul
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 1:26 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 April 2016 at 16:05, Émanuel Barry <vgr255@live.ca> wrote:
Hmmm -- so maybe you are saying that if the results of __str__ and __tostring__ are the same we're fine, just like if the results of __int__ and __index__ are the same we're fine? Otherwise an error is raised.
Pretty much; as above, if __tostring__ (in absence of a better name) is defined and does not raise, it should return the same as __str__.
OK, that makes sense as a definition of "what __tostring__ does". But I'm struggling to think of an example of code that would legitimately want to *use* it (unlike __index__ where the obvious motivating case was indexing a sequence). And even with a motivating example of use, I don't know that I can think of an example of something other than a string that might *provide* the method.
The problem with the "ASCII byte string" example is that if a type provides __tostring__ I'd expect it to work for all values of that type. I'm not clear if "ASCII byte string" is intended to mean "a value of type bytes that only contains ASCII characters", or "a type that subclasses bytes to refuse to allow non-ASCII". The former would imply __tostring__ working sometimes, but not always. The latter seems like a contrived example (although I'm open to someone explaining a real-world use case where it's important).
The original use-case was Path objects, which stringify as the human-readable representation of the file system path. If you need to pass a string-or-Path to something that requires a string, you need to convert the Path to a string, but *not* convert other arbitrary objects to strings; calling str(x) would happily convert the integer 1234 into the string "1234", and then you'd go trying to create that file in the current directory. Python does not let you append non-strings to strings unless you write __radd__ manually; this proposal would allow objects to declare to the interpreter "hey, I'm basically a string here", and allow Paths and strings to interoperate. ChrisA
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 08:37 AM, Chris Angelico wrote:
On Fri, Apr 8, 2016 at 1:26 AM, Paul Moore wrote:
OK, that makes sense as a definition of "what __tostring__ does". But I'm struggling to think of an example of code that would legitimately want to *use* it (unlike __index__ where the obvious motivating case was indexing a sequence). And even with a motivating example of use, I don't know that I can think of an example of something other than a string that might *provide* the method.
The original use-case was Path objects, which stringify as the human-readable representation of the file system path. If you need to pass a string-or-Path to something that requires a string, you need to convert the Path to a string, but *not* convert other arbitrary objects to strings; calling str(x) would happily convert the integer 1234 into the string "1234", and then you'd go trying to create that file in the current directory. Python does not let you append non-strings to strings unless you write __radd__ manually; this proposal would allow objects to declare to the interpreter "hey, I'm basically a string here", and allow Paths and strings to interoperate.
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form. Is there anything, anywhere in the Python ecosystem, that would also benefit from something like an __as_str__ method/attribute? -- ~Ethan~
![](https://secure.gravatar.com/avatar/d91403ac7536133c4afa2525c9d9414c.jpg?s=120&d=mm&r=g)
On 4/7/2016 11:43, Ethan Furman wrote:
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form.
Does any OS expose access to paths as anything other than the serialized form?
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 16:46, Alexander Walters <tritium-list@sdamon.com> wrote:
On 4/7/2016 11:43, Ethan Furman wrote:
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form.
Does any OS expose access to paths as anything other than the serialized form?
Does any OS have an object oriented API? At the OS level, you're lucky to get any sort of structured access, so I don't think that "what the OS provides" is the level to look at this at. Languages are what provide abstractions. Lisp provides a path object, I believe. Perl does (although it may well be a 3rd party CPAN library as with a lot of things in Perl). A path abstraction isn't commonly provided, but that doesn't mean it's not useful. But this is of course offtopic for this thread. Paul
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 1:43 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form.
Is there anything, anywhere in the Python ecosystem, that would also benefit from something like an __as_str__ method/attribute?
I'd like to see this used in 2/3 compatibility code. You can make an Ascii class which subclasses bytes, but can be treated as str-compatible in both versions. By restricting its contents to [0,128) it can easily be "safe" in both byte and text contexts, and it'll cover 99%+ of use cases. So if you try to getattr(x, Ascii("foo")), it'll work fine in Py2 (because Ascii is a subclass of str) and in Py3 (because Ascii.__tostring__ returns a valid string), and there's a guarantee that it'll behave the same way (because the string must be ASCII-only). ChrisA
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 16:49, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Apr 8, 2016 at 1:43 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form.
Is there anything, anywhere in the Python ecosystem, that would also benefit from something like an __as_str__ method/attribute?
I'd like to see this used in 2/3 compatibility code. You can make an Ascii class which subclasses bytes, but can be treated as str-compatible in both versions. By restricting its contents to [0,128) it can easily be "safe" in both byte and text contexts, and it'll cover 99%+ of use cases. So if you try to getattr(x, Ascii("foo")), it'll work fine in Py2 (because Ascii is a subclass of str) and in Py3 (because Ascii.__tostring__ returns a valid string), and there's a guarantee that it'll behave the same way (because the string must be ASCII-only).
OK, that makes sense. But when you say "it'll work fine in Python 3" how will that happen? What code needs to call __fromstring__ to make this happen? You mention getattr. Would you expect every builtin and stdlib function that takes a string to be modified to try __fromstring__? That sounds like a pretty big performance hit, as strings are very critical to the interpreter. Even worse, what should open() do? It takes a string as an argument. To support patthlib, it needs to also call __fspath__. I presume you'd also want it to call __fromstring__ so that your Ascii class could be used as an argument to open as well. This is starting to seem incredibly messy to solve a problem that's basically about extending support for Python 2, which is explicitly not something the Python 3 core should be doing... Paul
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 12:11, Paul Moore wrote:
OK, that makes sense. But when you say "it'll work fine in Python 3" how will that happen? What code needs to call __fromstring__ to make this happen? You mention getattr. Would you expect every builtin and stdlib function that takes a string to be modified to try __fromstring__? That sounds like a pretty big performance hit, as strings are very critical to the interpreter.
Isn't it only a performance hit on something that's an exception now? Like, if PyString_Check fails, then call it. I wonder how much could be done in a "blanket" way without having to change individual methods. Like, put the necessary machinery in PyArg_ParseTuple. Or does that borrow a reference? ---- Taking a step back, can someone explain to me in plain english why io and os shouldn't directly support pathlib? All this "well maybe make it a subclass, well maybe make a special protocol stuff can implement" stuff is dancing around that.
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 2:31 AM, Random832 <random832@fastmail.com> wrote:
On Thu, Apr 7, 2016, at 12:11, Paul Moore wrote:
OK, that makes sense. But when you say "it'll work fine in Python 3" how will that happen? What code needs to call __fromstring__ to make this happen? You mention getattr. Would you expect every builtin and stdlib function that takes a string to be modified to try __fromstring__? That sounds like a pretty big performance hit, as strings are very critical to the interpreter.
Isn't it only a performance hit on something that's an exception now? Like, if PyString_Check fails, then call it.
I like this idea; it can be supported by a very small semantic change, namely that this method is not guaranteed to be called on strings or subclasses of strings. For cross-implementation compatibility, I would require that str.__name_needed__() return self, and that well-behaved subclasses not override this; that way, there's no semantic difference. Consider this like the "x is y implies x == y" optimization that crops up here and there; while it's legal to have x.__eq__(x) return False (eg NaN), there are some places where it's assumed to be found anyway (or, to be technically correct, where something looks for "x is y or x == y" rather than simply "x == y"). So this would be "self if isinstance(self, str) else self.__thingy__()". Deep inside CPython, it would simply be failure-mode handling: instead of directly raising TypeError, attempt a treat-as-string conversion, which will raise TypeError if it's not possible. A cursory inspection of the code suggests that the only place that needs to changed is PyUnicode_FromObject in unicodeobject.c. It currently checks if the object is exactly an instance of PyUnicode (aka 'str'), then for a subclass of same, then raises TypeError "Can't convert '%.100s' object to str implicitly". Even the wording of that message leaves it open to more types being implicitly convertible; the only change here is that you can make your type convertible without actually subclassing str. ChrisA ChrisA
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 09:31 AM, Random832 wrote:
Taking a step back, can someone explain to me in plain english why io and os shouldn't directly support pathlib? All this "well maybe make it a subclass, well maybe make a special protocol stuff can implement" stuff is dancing around that.
The protocol is "the how" of supporting pathlib, and, interestingly enough, the easiest way to do so (avoids circular imports, etc., etc,.). -- ~Ethan~
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 13:30, Ethan Furman wrote:
The protocol is "the how" of supporting pathlib, and, interestingly enough, the easiest way to do so (avoids circular imports, etc., etc,.).
If the problem is importing pathlib, what about a "pathlib lite" that can check if an object is a path without importing pathlib? This could be a recipe, a separate module, or part of os. def is_Path(x): return 'pathlib' in sys.modules and isinstance(x, sys.modules['pathlib'].Path) def path_str(x): if isinstance(x, str): return x if isPath(x): return x.path raise TypeError # convenience methods for modules that want to support returning a Path but don't import it def to_Path(x): import pathlib return pathlib.Path(x) # most common case, return a path iff an input argument is a path. def to_Path_maybe(value, *args): if any(is_Path(arg) for arg in args): return to_Path(value) else: return value All this other stuff seems to have an ambition of making these things fully general, such that some other library can be dropped in instead of pathlib without subclassing either str or Path, and I'm not sure what the use case for that is. Pathlib is the battery that's included.
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 2:11 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Even worse, what should open() do? It takes a string as an argument. To support patthlib, it needs to also call __fspath__. I presume you'd also want it to call __fromstring__ so that your Ascii class could be used as an argument to open as well. This is starting to seem incredibly messy to solve a problem that's basically about extending support for Python 2, which is explicitly not something the Python 3 core should be doing...
This would replace __fspath__. There'd be no need for a Path-specific dunder if there's a generic "this can be treated as a string" dunder. ChrisA
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 17:44, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Apr 8, 2016 at 2:11 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Even worse, what should open() do? It takes a string as an argument. To support patthlib, it needs to also call __fspath__. I presume you'd also want it to call __fromstring__ so that your Ascii class could be used as an argument to open as well. This is starting to seem incredibly messy to solve a problem that's basically about extending support for Python 2, which is explicitly not something the Python 3 core should be doing...
This would replace __fspath__. There'd be no need for a Path-specific dunder if there's a generic "this can be treated as a string" dunder.
So the only things that should implement the new protocol would be paths. Otherwise, they could be passed to things that expect a path. Once again, I'm confused. Can someone please explain to me how to decide whether my type should provide the new protocol. And whether my code should check the new protocol. At the moment, I can't answer those questions with the information given in this thread. Paul
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 10:19 Paul Moore <p.f.moore@gmail.com> wrote:
On 7 April 2016 at 17:44, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Apr 8, 2016 at 2:11 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Even worse, what should open() do? It takes a string as an argument. To support patthlib, it needs to also call __fspath__. I presume you'd also want it to call __fromstring__ so that your Ascii class could be used as an argument to open as well. This is starting to seem incredibly messy to solve a problem that's basically about extending support for Python 2, which is explicitly not something the Python 3 core should be doing...
This would replace __fspath__. There'd be no need for a Path-specific dunder if there's a generic "this can be treated as a string" dunder.
So the only things that should implement the new protocol would be paths. Otherwise, they could be passed to things that expect a path.
Once again, I'm confused.
Can someone please explain to me how to decide whether my type should provide the new protocol. And whether my code should check the new protocol. At the moment, I can't answer those questions with the information given in this thread.
I've reached the same conclusion/point myself. The attempt to come up with a pure solution is wreaking havoc with the practicality side of my brain that doesn't understand the rules it's supposed to follow in order to make this work.
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 08:49 AM, Chris Angelico wrote:
On Fri, Apr 8, 2016 at 1:43 AM, Ethan Furman wrote:
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form.
Is there anything, anywhere in the Python ecosystem, that would also benefit from something like an __as_str__ method/attribute?
I'd like to see this used in 2/3 compatibility code. You can make an Ascii class which subclasses bytes, but can be treated as str-compatible in both versions. By restricting its contents to [0,128) it can easily be "safe" in both byte and text contexts, and it'll cover 99%+ of use cases. So if you try to getattr(x, Ascii("foo")), it'll work fine in Py2 (because Ascii is a subclass of str) and in Py3 (because Ascii.__tostring__ returns a valid string), and there's a guarantee that it'll behave the same way (because the string must be ASCII-only).
Make the Ascii class subclass bytes in 2.x and str in 3.x; the 2.x version's __str__ returns bytes, while it's __unicode__ returns unicode and in 3.x __str__returns str, and __unicode__ doesn't exist. Problem solved, no other special methods needed.* ;) -- ~Ethan~ * crosses fingers hoping to not have missed something obvious ;)
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 08:58 Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Apr 8, 2016 at 1:43 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
The problem with that is that Paths are conceptually not strings, they just serialize to strings, and we have a lot of infrastructure in place to deal with that serialized form.
Is there anything, anywhere in the Python ecosystem, that would also benefit from something like an __as_str__ method/attribute?
I'd like to see this used in 2/3 compatibility code. You can make an Ascii class which subclasses bytes, but can be treated as str-compatible in both versions. By restricting its contents to [0,128) it can easily be "safe" in both byte and text contexts, and it'll cover 99%+ of use cases. So if you try to getattr(x, Ascii("foo")), it'll work fine in Py2 (because Ascii is a subclass of str) and in Py3 (because Ascii.__tostring__ returns a valid string), and there's a guarantee that it'll behave the same way (because the string must be ASCII-only).
But couldn't you also just define a str subclass that checks its argument(s) are only valid ASCII values? What you're proposing is to potentially change all places that operate with a string to now check for a special method which would be a costly change potentially to performance as well as propagating this concept everywhere a string-like object is expected. I think it's important to realize that the main reason we are considering this special method concept is to make it easier to introduce in third-party code which doesn't have pathlib or for people who don't want to import pathlib just to convert a pathlib.PurePath object to a string. Otherwise we would simply have pathlib.fspath() or something that checked if its argument was a subclass of pathlib.PurePath and if so call str() on it, else double-check that argument was an instance of str and keep it simple. But instead we are trying to do the practical thing and come up with a common method name that people can be sure will exist on pathlib.PurePath and any other third-party path library. I think trying to generalize to "string-like but not __str__()" is a premature optimization that has not exposed enough use-cases to warrant such a deep change to the language.
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 3:31 AM, Brett Cannon <brett@python.org> wrote:
But couldn't you also just define a str subclass that checks its argument(s) are only valid ASCII values? What you're proposing is to potentially change all places that operate with a string to now check for a special method which would be a costly change potentially to performance as well as propagating this concept everywhere a string-like object is expected.
Fair enough. That was a spur-of-the-moment thought. To be honest, this proposal is a massive generalization from, ultimately, a single use-case.
I think it's important to realize that the main reason we are considering this special method concept is to make it easier to introduce in third-party code which doesn't have pathlib or for people who don't want to import pathlib just to convert a pathlib.PurePath object to a string.
And this should make it easier for third-party code to be functional without even being aware of the Path object. There are two basic things that code will be doing with paths: passing them unchanged to standard library functions (eg open()), and combining them with strings. The first will work by definition; the second will if paths can implicitly upcast to strings. In contrast, a function or method to convert a path to a string requires conscious effort on the part of any function that needs to do such manipulation, which correspondingly means version compatibility checks. Libraries will need to release a new version that's path-compatible, even though they don't actually gain any functionality. ChrisA
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 11:28 AM, Chris Angelico <rosuav@gmail.com> wrote:
Fair enough. That was a spur-of-the-moment thought. To be honest, this proposal is a massive generalization from, ultimately, a single use-case.
yes, but now that you say: ...
And this should make it easier for third-party code to be functional without even being aware of the Path object. There are two basic things that code will be doing with paths: passing them unchanged to standard library functions (eg open()), and combining them with strings. The first will work by definition; the second will if paths can implicitly upcast to strings.
so this is really a way to make the whole path!=string thing easier -- we don't want to simply call str() on anything that might be a path, because that will work on anything, whether it's the least bit pathlike at all, but this would introduce a new kind of __str__, so that only things for which it makes sense would "Just work": str + something Would only concatenate to a new string if somethign was an object that couuld "losslessly" be considered a string? but if we use the __index__ example, that is specifically "an integer that can be ued as an index", not "something that can be losslessly converted to an integer". so why not the __fspath__ protocol anyway? Unless there are all sorts of other use cases you have in mind? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 11:45 AM, Chris Barker wrote:
On Thu, Apr 7, 2016 at 11:28 AM, Chris Angelico wrote:
And this should make it easier for third-party code to be functional without even being aware of the Path object. There are two basic things that code will be doing with paths: passing them unchanged to standard library functions (eg open()), and combining them with strings. The first will work by definition; the second will if paths can implicitly upcast to strings.
so this is really a way to make the whole path!=string thing easier -- we don't want to simply call str() on anything that might be a path, because that will work on anything, whether it's the least bit pathlike at all, but this would introduce a new kind of __str__, so that only things for which it makes sense would "Just work":
str + something
Would only concatenate to a new string if somethign was an object that couuld "losslessly" be considered a string?
Which is mildly attractive.
but if we use the __index__ example, that is specifically "an integer that can be ued as an index", not "something that can be losslessly converted to an integer".
__index__ was originally created to support indexing, but has morphed over time to mean "something that can be losslessly converted to an integer". -- ~Ethan~
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 11:59 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
__index__ was originally created to support indexing, but has morphed over time to mean "something that can be losslessly converted to an integer".
Ahh! very good precedent, then -- where else is this used? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/2828041405aa313004b6549acf918228.jpg?s=120&d=mm&r=g)
On 4/7/2016 3:00 PM, Chris Barker wrote:
On Thu, Apr 7, 2016 at 11:59 AM, Ethan Furman <ethan@stoneleaf.us <mailto:ethan@stoneleaf.us>> wrote:
__index__ was originally created to support indexing, but has morphed over time to mean "something that can be losslessly converted to an integer".
Ahh! very good precedent, then -- where else is this used?
It's used by hex, oct, and bin, at least:
class Foo: ... def __index__(self): return 42 ... hex(Foo()) '0x2a' oct(Foo()) '0o52' bin(Foo()) '0b101010'
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 4:45 AM, Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Apr 7, 2016 at 11:28 AM, Chris Angelico <rosuav@gmail.com> wrote:
And this should make it easier for third-party code to be functional without even being aware of the Path object. There are two basic things that code will be doing with paths: passing them unchanged to standard library functions (eg open()), and combining them with strings. The first will work by definition; the second will if paths can implicitly upcast to strings.
so this is really a way to make the whole path!=string thing easier -- we don't want to simply call str() on anything that might be a path, because that will work on anything, whether it's the least bit pathlike at all, but this would introduce a new kind of __str__, so that only things for which it makes sense would "Just work":
str + something
Would only concatenate to a new string if somethign was an object that couuld "losslessly" be considered a string?
but if we use the __index__ example, that is specifically "an integer that can be ued as an index", not "something that can be losslessly converted to an integer".
That's two ways of describing the same thing. When you call __index__, you either get back an integer with the exact same meaning as the original object, or you get an exception. In contrast, calling __int__ may result in an integer which is similar, but not equal, to the original - for instance, int(3.2) is 3. That's a lossy conversion, which __index__ will refuse to do. Similarly, str() can be quite lossy and non-strict on arbitrary objects. With this conversion, it will always either return an exactly-equivalent string, or raise.
so why not the __fspath__ protocol anyway?
Unless there are all sorts of other use cases you have in mind?
Not actually in mind, but there have been multiple people raising concerns that __fspath__ is too specific for what it achieves. ChrisA
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 11:29 Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Apr 8, 2016 at 3:31 AM, Brett Cannon <brett@python.org> wrote:
But couldn't you also just define a str subclass that checks its argument(s) are only valid ASCII values? What you're proposing is to potentially change all places that operate with a string to now check for a special method which would be a costly change potentially to performance as well as propagating this concept everywhere a string-like object is expected.
Fair enough. That was a spur-of-the-moment thought. To be honest, this proposal is a massive generalization from, ultimately, a single use-case.
I think it's important to realize that the main reason we are considering this special method concept is to make it easier to introduce in third-party code which doesn't have pathlib or for people who don't want to import pathlib just to convert a pathlib.PurePath object to a string.
And this should make it easier for third-party code to be functional without even being aware of the Path object. There are two basic things that code will be doing with paths: passing them unchanged to standard library functions (eg open()), and combining them with strings. The first will work by definition; the second will if paths can implicitly upcast to strings.
In contrast, a function or method to convert a path to a string requires conscious effort on the part of any function that needs to do such manipulation, which correspondingly means version compatibility checks.
Libraries will need to release a new version that's path-compatible, even though they don't actually gain any functionality.
But they will with yours as well. At best you could get this into Python 3.6 and then propagate this implicit string-like conversion functionality throughout the language and stdlib, but any implicitness won't be backported either as it's implicit. So while you're saying you don't like the explicitness required to backport this, your solution doesn't help with that either as you will still need to do the exact same method lookup for any code that wants to work pre-3.6. And if using path objects takes off and new APIs come up that don't take a string for paths, then the explicit conversion can be left out -- and perhaps removed in converted APIs -- while your implicit conversion will forever be baked into Python itself.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 16:37, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Apr 8, 2016 at 1:26 AM, Paul Moore <p.f.moore@gmail.com> wrote:
On 7 April 2016 at 16:05, Émanuel Barry <vgr255@live.ca> wrote:
Hmmm -- so maybe you are saying that if the results of __str__ and __tostring__ are the same we're fine, just like if the results of __int__ and __index__ are the same we're fine? Otherwise an error is raised.
Pretty much; as above, if __tostring__ (in absence of a better name) is defined and does not raise, it should return the same as __str__.
OK, that makes sense as a definition of "what __tostring__ does". But I'm struggling to think of an example of code that would legitimately want to *use* it (unlike __index__ where the obvious motivating case was indexing a sequence). And even with a motivating example of use, I don't know that I can think of an example of something other than a string that might *provide* the method.
The problem with the "ASCII byte string" example is that if a type provides __tostring__ I'd expect it to work for all values of that type. I'm not clear if "ASCII byte string" is intended to mean "a value of type bytes that only contains ASCII characters", or "a type that subclasses bytes to refuse to allow non-ASCII". The former would imply __tostring__ working sometimes, but not always. The latter seems like a contrived example (although I'm open to someone explaining a real-world use case where it's important).
The original use-case was Path objects, which stringify as the human-readable representation of the file system path. If you need to pass a string-or-Path to something that requires a string, you need to convert the Path to a string, but *not* convert other arbitrary objects to strings; calling str(x) would happily convert the integer 1234 into the string "1234", and then you'd go trying to create that file in the current directory. Python does not let you append non-strings to strings unless you write __radd__ manually; this proposal would allow objects to declare to the interpreter "hey, I'm basically a string here", and allow Paths and strings to interoperate.
But the proposal for paths is to have a *specific* method that says "give me a string representing a filesystem path from this object". An "interpret this object as a string" wouldn't be appropriate for the cases where I'd want to do "give me a string representing a filesystem path". And that's where I get stuck, as I can't think of an example where I *would* want the more general option. For a number of reasons: 1. I can't think of a real-world example of when I'd *use* such a facility 2. I can't think of a real-world example of a type that might *provide* such a facility 3. I can't see how something so general would be of benefit. The __index__ and __fspath__ special methods have clear, focused reasons for existing. The proposed __tostring__ protocol seems too general to have a purpose. Paul
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Apr 7, 2016 8:57 AM, "Paul Moore" <p.f.moore@gmail.com> wrote: [...]
But the proposal for paths is to have a *specific* method that says "give me a string representing a filesystem path from this object". An "interpret this object as a string" wouldn't be appropriate for the cases where I'd want to do "give me a string representing a filesystem path". And that's where I get stuck, as I can't think of an example where I *would* want the more general option. For a number of reasons:
1. I can't think of a real-world example of when I'd *use* such a facility 2. I can't think of a real-world example of a type that might *provide* such a facility 3. I can't see how something so general would be of benefit.
Numpy and friends would implement this if it existed, for things like converting numpy strings to python strings. But I can't think of the cases where this would be useful either. The reason that it's useful to have __index__, and that it would be useful to have __fspath__, is that in both cases there are a bunch of interfaces defined as part of the core language that need to implement the consumer side of the protocol, so a core language protocol is the only real way to go. I'm not thinking of a lot of analogous APIs in core python take specifically-strings? Most of the important ones are str methods, so regular self-dispatch already handles it. Or I guess str.__add__(someobj), but that's already handled by binop dispatch. -n
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 15:47, Nathaniel Smith wrote:
The reason that it's useful to have __index__, and that it would be useful to have __fspath__, is that in both cases there are a bunch of interfaces defined as part of the core language that need to implement the consumer side of the protocol, so a core language protocol is the only real way to go. I'm not thinking of a lot of analogous APIs in core python take specifically-strings?
Well, there's getattr and the like. I'm not sure why you'd want to pass a not-really-a-string (though, an ASCII bytes is the one thing I _can_ think of*, and didn't the inability to pass unicode strings to a bunch of APIs cause no end of trouble for Python 2 users?), but then I'm not sure why you'd want to pass anything but an int (or a python 2 int/long) to list indexing. *Maybe a RUE-string that subclasses bytes and uses UTF-8.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 14:04, Chris Angelico <rosuav@gmail.com> wrote:
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
What would this be used for? Other than Path, what types would be viable candidates for being "string like"? I can't think of a use case for this feature. For that matter, what constitutes a "lossless conversion to str"? If you mean "doesn't lose information", then integers could easily be said to have such a conversion - but that doesn't seem right (we don't want things to start auto-converting numbers to strings). I'm not sure I understand the point. Paul
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 09:41, Paul Moore wrote:
On 7 April 2016 at 14:04, Chris Angelico <rosuav@gmail.com> wrote:
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
What would this be used for? Other than Path, what types would be viable candidates for being "string like"? I can't think of a use case for this feature.
How about ASCII bytes strings? ;)
For that matter, what constitutes a "lossless conversion to str"?
What's __index__ for?
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 15:07, Random832 <random832@fastmail.com> wrote:
On Thu, Apr 7, 2016, at 09:41, Paul Moore wrote:
On 7 April 2016 at 14:04, Chris Angelico <rosuav@gmail.com> wrote:
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
What would this be used for? Other than Path, what types would be viable candidates for being "string like"? I can't think of a use case for this feature.
How about ASCII bytes strings? ;)
OK, I guess. I can't see why it's a significant enough use case to justify a new protocol, though.
For that matter, what constitutes a "lossless conversion to str"?
What's __index__ for?
I don't follow. It's for indexing, which requires an integer. How does that relate to the question of what constitutes a lossless conversion to str? Paul
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 11:01, Paul Moore wrote:
For that matter, what constitutes a "lossless conversion to str"?
What's __index__ for?
I don't follow. It's for indexing, which requires an integer.
Sure, but why isn't int() good enough? For the same reason you only want the kinds of objects that implement __index__ (and not, say, a float or a string that happens to be numeric) for indexing, you only want the kinds of objects that implement this method for certain purposes.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 17:16, Random832 <random832@fastmail.com> wrote:
On Thu, Apr 7, 2016, at 11:01, Paul Moore wrote:
For that matter, what constitutes a "lossless conversion to str"?
What's __index__ for?
I don't follow. It's for indexing, which requires an integer.
Sure, but why isn't int() good enough? For the same reason you only want the kinds of objects that implement __index__ (and not, say, a float or a string that happens to be numeric) for indexing, you only want the kinds of objects that implement this method for certain purposes.
You're making my point here. A "lossless conversion to str" isn't a good enough definition of which things should implement the new protocol. I'm trying to get someone to tell me what criteria I should use to decide if my type should implement the new protocol. At the moment, I have no idea. Paul
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 4/7/2016 12:16 PM, Random832 wrote:
On Thu, Apr 7, 2016, at 11:01, Paul Moore wrote:
Someone (lost is quotes)
What's __index__ for?
I don't follow. It's for indexing, which requires an integer.
Sure, but why isn't int() good enough?
Good question. The reason to add __index__ was to allow indexing with some things other that ints while not allowing just anything that can be converted to int (with or without loss). The reason for the restriction is to prevent subtle bugs. Expanding the domain of indexing with __index__ instead of int() was a judgment call, not a logical necessity.
For the same reason you only want the kinds of objects that implement __index__ (and not, say, a float or a string that happens to be numeric) for indexing, you only want the kinds of objects that implement this method for certain purposes.
I understand this, but I am going to challenge the analogy. An index really is a int -- a count of items of the sequence from either end (where the right end can be thought of as an invisible End_ofSequence item). A path, on the other hand, is not really a string. A path is a sequence of nodes in the filesystem graph. Converting structured data, in this case paths, to strings, is just a least-common-denominator way of communicating structured data between different languages. To me, the default proposal to expand the domain of open and other path functions is to call str on the path arg, either always or as needed. We should then ask "why isn't str() good enough"? Most bad args for open will immediately result in a file-not-found exception. But the os.path functions would not. Do the possible bugs and violation of python philosophy out-weigh the simplicity of the proposal? -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 11:48 AM, Terry Reedy <tjreedy@udel.edu> wrote:
To me, the default proposal to expand the domain of open and other path functions is to call str on the path arg, either always or as needed. We should then ask "why isn't str() good enough"? Most bad args for open will immediately result in a file-not-found exception. But the os.path functions would not. Do the possible bugs and violation of python philosophy out-weigh the simplicity of the proposal?
Thanks -- this is exactly the question at hand. it's gotten a bit caught up in all the discussion of what to call the magic method, should there be a top-level function, etc. but this is the only question on the table that isn't just bikeshedding. personally, I think not -- but I"m very close to the fence. Though no, most calls to open() would not fail -- most calls to open(path, 'r') would but most calls to open(path, 'w') would succeed and produce some really weird filenames -- but so what? that's the question -- are these subtle and hard to find bugs we want to prevent? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 11:59 AM, Chris Barker wrote:
Though no, most calls to open() would not fail -- most calls to open(path, 'r') would but most calls to open(path, 'w') would succeed and produce some really weird filenames -- but so what?
that's the question -- are these subtle and hard to find bugs we want to prevent?
If we are trying to fix issues, why would we leave the door open to other, more subtle, bugs? -- ~Ethan~
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 12:06 PM, Ethan Furman <ethan@stoneleaf.us> wrote:
that's the question -- are these subtle and hard to find bugs we want to
prevent?
If we are trying to fix issues, why would we leave the door open to other, more subtle, bugs?
we're not trying to fix issues -- we're trying to make Path compatible with the stdlib and other libs that use strings as path. Or do you mean that not having Path subclass str is trying to fix issues, in which case, yes, I suppose, but you need to stop somewhere, or we'll have a statically typed language... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 12:10 PM, Chris Barker wrote:
On Thu, Apr 7, 2016 at 12:06 PM, Ethan Furman wrote:
If we are trying to fix issues, why would we leave the door open to other, more subtle, bugs?
we're not trying to fix issues -- we're trying to make Path compatible with the stdlib and other libs that use strings as path.
Yeah, that's the issue. ;) -- ~Ethan~
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 8 April 2016 at 04:48, Terry Reedy <tjreedy@udel.edu> wrote:
To me, the default proposal to expand the domain of open and other path functions is to call str on the path arg, either always or as needed. We should then ask "why isn't str() good enough"? Most bad args for open will immediately result in a file-not-found exception.
Not when you're *creating* files and directories. Using "str(path)" as the protocol means these all become valid operations: open(1.0, "w") open(object, "w") open(object(), "w") open(str, "w") open(input, "w") Everything implements __str__ or __repr__, so *everything* becomes acceptable as an argument to filesystem mutating operations, instead of those operations bailing out immediately complaining they've been asked to do something that doesn't make any sense. I strongly encourage folks interested in the fspath protocol design debate to read the __index__ PEP: https://www.python.org/dev/peps/pep-0357/ Start from the title: "Allowing Any Object to be Used for Slicing" The protocol wasn't designed in the abstract: it had a concrete goal of allowing objects other than builtins to be usable in the "x:y:z" slicing syntax. Those objects weren't hypothetical either: the rationale spells out "In NumPy, for example, there are 8 different integer scalars corresponding to unsigned and signed integers of 8, 16, 32, and 64 bits. These type-objects could reasonably be used as integers in many places where Python expects true integers but cannot inherit from the Python integer type because of incompatible memory layouts. There should be some way to be able to tell Python that an object can behave like an integer." The PEP also spells out what's wrong with the "just use int(obj)" alternative: "It is not possible to use the nb_int (and __int__ special method) for this purpose because that method is used to *coerce* objects to integers. It would be inappropriate to allow every object that can be coerced to an integer to be used as an integer everywhere Python expects a true integer. For example, if __int__ were used to convert an object to an integer in slicing, then float objects would be allowed in slicing and x[3.2:5.8] would not raise an error as it should." Extending the use of the protocol to other contexts (such as sequence repetition and optimised lookups on range objects) was then taken up on a case by case basis, but the protocol semantics themselves were defined by that original use case of "allow NumPy integers to be used when slicing sequences". The equivalent motivating use case here is "allow pathlib objects to be used with the open() builtin, os module functions, and os.path module functions". The open() builtin handles str paths, integers (file descriptors), and bytes-like objects (pre-encoded paths) The os and os.path functions handle some combination of those 3 depending on the specific function Working directly with file descriptors is relatively rare, so we can leave that as a special case. Similarly, working directly with bytes-like objects introduces cross-platform portability problems, and also changes the output type of many operations, so we'll keep that as a special case, too. That leaves the text representation, and the question of defining equivalents to "operator.index" and its underlying __index__ protocol. My suggestion of os.fspath as the conversion function is based on: - "path" being too generic (we have sys.path, os.path, and the PATH envvar as potential sources of confusion) - "fspath" being similar to "os.fsencode" and "os.fsdecode", which are the operations for converting a filesystem path in text form to and from its bytes-like object form - os and os.path being two of the main consumers of the proposed protocol - os being a builtin module that underpins most filesystem operations anyway, so folks shouldn't be averse to importing it in code that wants to consume the new protocol Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/7256049d410aa28c240527e1a779799e.jpg?s=120&d=mm&r=g)
On Apr 09 2016, Nick Coghlan <ncoghlan-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
The equivalent motivating use case here is "allow pathlib objects to be used with the open() builtin, os module functions, and os.path module functions".
Why isn't this use case perfectly solved by: def open(obj, *a): # If pathlib isn't imported, we can't possibly receive # a Path object. pathlib = sys.modules.get('pathlib', None): if pathlib is not None and isinstance(obj, pathlib.Path): obj = str(obj) return real_open(obj, *a)
That leaves the text representation, and the question of defining equivalents to "operator.index" and its underlying __index__ protocol.
well, or the above (as far as I can see) :-). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/10/2016 02:24 PM, Nikolaus Rath wrote:
On Apr 09 2016, Nick Coghlan wrote:
The equivalent motivating use case here is "allow pathlib objects to be used with the open() builtin, os module functions, and os.path module functions".
Why isn't this use case perfectly solved by:
def open(obj, *a): # If pathlib isn't imported, we can't possibly receive # a Path object. pathlib = sys.modules.get('pathlib', None): if pathlib is not None and isinstance(obj, pathlib.Path): obj = str(obj) return real_open(obj, *a)
pathlib is the primary motivator, but there's no reason to shut everyone else out. By having a well-defined protocol and helper function we not only make our own lives easier but we also make the lives of third-party libraries and experimenters easier. -- ~Ethan~
![](https://secure.gravatar.com/avatar/7256049d410aa28c240527e1a779799e.jpg?s=120&d=mm&r=g)
On Apr 10 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
On 04/10/2016 02:24 PM, Nikolaus Rath wrote:
On Apr 09 2016, Nick Coghlan wrote:
The equivalent motivating use case here is "allow pathlib objects to be used with the open() builtin, os module functions, and os.path module functions".
Why isn't this use case perfectly solved by:
def open(obj, *a): # If pathlib isn't imported, we can't possibly receive # a Path object. pathlib = sys.modules.get('pathlib', None): if pathlib is not None and isinstance(obj, pathlib.Path): obj = str(obj) return real_open(obj, *a)
pathlib is the primary motivator, but there's no reason to shut everyone else out.
By having a well-defined protocol and helper function we not only make our own lives easier but we also make the lives of third-party libraries and experimenters easier.
To me this sounds like catering to a hypothetical audience that may want to do hypothetical things. If you start with the above, and people complain that their favorite non-pathlib path library is not supported by the stdlib, you can still add a protocol. But if you add a protocol right away, you're stuck with the complexity for a very long time even if almost no one actually uses it. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Mon, Apr 11, 2016, at 10:43, Nikolaus Rath wrote:
To me this sounds like catering to a hypothetical audience that may want to do hypothetical things. If you start with the above, and people complain that their favorite non-pathlib path library is not supported by the stdlib, you can still add a protocol. But if you add a protocol right away, you're stuck with the complexity for a very long time even if almost no one actually uses it.
And there's nothing stopping people from subclassing from Path (should it be PurePath?), or monkey-patching pathlib. In this case, "isinstance(foo, Path) returns true" is the protocol. Also, where is the .path attribute? It's documented <https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.path>, but...
pathlib.Path(".").path Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'PosixPath' object has no attribute 'path'
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/11/2016 07:57 AM, Random832 wrote:
Also, where is the .path attribute? It's documented <https://docs.python.org/3/library/pathlib.html#pathlib.PurePath.path>, but...
pathlib.Path(".").path Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'PosixPath' object has no attribute 'path'
pathlib is provisional -- `.path` has been committed and will be available with the next releases (assuming we don't change it out for the protocol version). -- ~Ethan~
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/11/2016 07:43 AM, Nikolaus Rath wrote:
On Apr 10 2016, Ethan Furman wrote:
By having a well-defined protocol and helper function we not only make our own lives easier but we also make the lives of third-party libraries and experimenters easier.
To me this sounds like catering to a hypothetical audience that may want to do hypothetical things. If you start with the above, and people complain that their favorite non-pathlib path library is not supported by the stdlib, you can still add a protocol. But if you add a protocol right away, you're stuck with the complexity for a very long time even if almost no one actually uses it.
The part of "make our own lives easier" is not a hypothetical audience. Making my own library (antipathy) work seamlessly with pathlib and DirEntry is not hypothetical. -- ~Ethan~
![](https://secure.gravatar.com/avatar/7256049d410aa28c240527e1a779799e.jpg?s=120&d=mm&r=g)
On Apr 11 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
On 04/11/2016 07:43 AM, Nikolaus Rath wrote:
On Apr 10 2016, Ethan Furman wrote:
By having a well-defined protocol and helper function we not only make our own lives easier but we also make the lives of third-party libraries and experimenters easier.
To me this sounds like catering to a hypothetical audience that may want to do hypothetical things. If you start with the above, and people complain that their favorite non-pathlib path library is not supported by the stdlib, you can still add a protocol. But if you add a protocol right away, you're stuck with the complexity for a very long time even if almost no one actually uses it.
The part of "make our own lives easier" is not a hypothetical audience.
My assumption was that "own" refers to core developers here, while..
Making my own library (antipathy) work seamlessly with pathlib and DirEntry is not hypothetical.
..here you seem to be wearing your third-party maintainer hat :-). As far as I can see, implementing a protocol instead of adding a few isinstance checks is more likely to make the life of a CPython developer harder than easier. Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/11/2016 02:09 PM, Nikolaus Rath wrote:
On Apr 11 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
On 04/11/2016 07:43 AM, Nikolaus Rath wrote:
On Apr 10 2016, Ethan Furman wrote:
By having a well-defined protocol and helper function we not only make our own lives easier but we also make the lives of third-party libraries and experimenters easier.
To me this sounds like catering to a hypothetical audience that may want to do hypothetical things. If you start with the above, and people complain that their favorite non-pathlib path library is not supported by the stdlib, you can still add a protocol. But if you add a protocol right away, you're stuck with the complexity for a very long time even if almost no one actually uses it.
The part of "make our own lives easier" is not a hypothetical audience.
My assumption was that "own" refers to core developers here, while..
It does.
Making my own library (antipathy) work seamlessly with pathlib and DirEntry is not hypothetical.
..here you seem to be wearing your third-party maintainer hat :-).
I was. :)
As far as I can see, implementing a protocol instead of adding a few isinstance checks is more likely to make the life of a CPython developer harder than easier.
I disagree. And the protocol idea was not mine, so apparently other core-devs also disagree (or think it's worth it, regardless). Even without the protocol I would think we'd still make a separate function to check the input. -- ~Ethan~
![](https://secure.gravatar.com/avatar/7256049d410aa28c240527e1a779799e.jpg?s=120&d=mm&r=g)
On Apr 11 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
As far as I can see, implementing a protocol instead of adding a few isinstance checks is more likely to make the life of a CPython developer harder than easier.
I disagree. And the protocol idea was not mine, so apparently other core-devs also disagree (or think it's worth it, regardless).
I haven't found any email explaining why a protocol would make things easier than the isinstance() approach (and I read most of the threads both here and on -dev), so I was assuming that the core-devs in question don't disagree but haven't considered the second approach. But this will be my last mail advocating it. If there's still no interest, then there probably is a reason for it :-). Best, -Nikolaus -- GPG encrypted emails preferred. Key id: 0xD113FCAC3C4E599F Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.«
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Tue, 12 Apr 2016 at 10:18 Nikolaus Rath <Nikolaus@rath.org> wrote:
On Apr 11 2016, Ethan Furman < ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
As far as I can see, implementing a protocol instead of adding a few isinstance checks is more likely to make the life of a CPython developer harder than easier.
I disagree. And the protocol idea was not mine, so apparently other core-devs also disagree (or think it's worth it, regardless).
I haven't found any email explaining why a protocol would make things easier than the isinstance() approach (and I read most of the threads both here and on -dev), so I was assuming that the core-devs in question don't disagree but haven't considered the second approach.
I disagree with the idea. :) Type checking tends to be too strict as it prevents duck typing. And locking ourselves to only what's in the stdlib is way too restrictive when there are alternative path libraries out there already. Plus it's short-sighted to assume no one will ever come up with a better path library, and so tying us down to only pathlib.PurePath explicitly would be a mistake long-term when specifying a single method keeps things flexible (this is the same reason __index__() exists and we simply don't say indexing has to be with a subclass of int or something). -Brett
![](https://secure.gravatar.com/avatar/f3ba3ecffd20251d73749afbfa636786.jpg?s=120&d=mm&r=g)
On 13 April 2016 at 03:33, Brett Cannon <brett@python.org> wrote:
On Tue, 12 Apr 2016 at 10:18 Nikolaus Rath <Nikolaus@rath.org> wrote:
On Apr 11 2016, Ethan Furman <ethan-gcWI5d7PMXnvaiG9KC9N7Q@public.gmane.org> wrote:
As far as I can see, implementing a protocol instead of adding a few isinstance checks is more likely to make the life of a CPython developer harder than easier.
I disagree. And the protocol idea was not mine, so apparently other core-devs also disagree (or think it's worth it, regardless).
I haven't found any email explaining why a protocol would make things easier than the isinstance() approach (and I read most of the threads both here and on -dev), so I was assuming that the core-devs in question don't disagree but haven't considered the second approach.
I disagree with the idea. :) Type checking tends to be too strict as it prevents duck typing. And locking ourselves to only what's in the stdlib is way too restrictive when there are alternative path libraries out there already. Plus it's short-sighted to assume no one will ever come up with a better path library, and so tying us down to only pathlib.PurePath explicitly would be a mistake long-term when specifying a single method keeps things flexible (this is the same reason __index__() exists and we simply don't say indexing has to be with a subclass of int or something).
In addition to these general API design points, there's also a key pragmatic point, which is that we probably want importlib._bootstrap_external to be able to use the new API, which means it needs to work before any non-builtin and non-frozen modules are available. That's easy with a protocol, hard with a concrete class. Most folks don't need to worry about "How we do make this code work when the interpeter isn't fully configured yet?", but we don't always have that luxury :) As for why it wasn't specifically discussed before now, "protocols are preferable to dependencies on concrete classes" is such an entrenched design pattern in Python by this point that we tend to take it for granted, so unless someone specifically asks for an explanation, we're unlikely to spell it out again. (Thanks for doing that, by the way - I suspect you're far from the only one that was puzzled by the apparent omission). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 07:07 AM, Random832 wrote:
What's __index__ for?
__index__ is a way to get an int from an int-like object without losing information; so it fails with values like 3.4, but should succeed with values like Fraction(4, 2). __int__ is a way to convert the value to an int, so 3.4 becomes 3 (and and the 4/10's is lost). -- ~Ethan~
![](https://secure.gravatar.com/avatar/d6b9415353e04ffa6de5a8f3aaea0553.jpg?s=120&d=mm&r=g)
On 4/7/2016 11:03 AM, Ethan Furman wrote:
On 04/07/2016 07:07 AM, Random832 wrote:
What's __index__ for?
__index__ is a way to get an int from an int-like object without losing information; so it fails with values like 3.4, but should succeed with values like Fraction(4, 2).
__int__ is a way to convert the value to an int, so 3.4 becomes 3 (and and the 4/10's is lost).
Why is that a problem? Loss is not the issue. And as someone else pointed out, int('4') does not lose information, but seq['1'] is prohibited. Why is passing integer strings as indexes bad? The answer to the latter, I believe, is that Guido is against treating numbers and string representations of numbers as interchangable. And the answer to both, I believe, is that the downside of flexibility is ease of creating buggy code, especially code where bugs do not immediately raise something, or ease of creating confusing code that is hard to maintain. -- Terry Jan Reedy
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 5:06 AM, Terry Reedy <tjreedy@udel.edu> wrote:
On 4/7/2016 11:03 AM, Ethan Furman wrote:
On 04/07/2016 07:07 AM, Random832 wrote:
What's __index__ for?
__index__ is a way to get an int from an int-like object without losing information; so it fails with values like 3.4, but should succeed with values like Fraction(4, 2).
__int__ is a way to convert the value to an int, so 3.4 becomes 3 (and and the 4/10's is lost).
Why is that a problem? Loss is not the issue. And as someone else pointed out, int('4') does not lose information, but seq['1'] is prohibited. Why is passing integer strings as indexes bad?
The answer to the latter, I believe, is that Guido is against treating numbers and string representations of numbers as interchangable. And the answer to both, I believe, is that the downside of flexibility is ease of creating buggy code, especially code where bugs do not immediately raise something, or ease of creating confusing code that is hard to maintain.
Hence my wording of "string-like". Anything can be converted to a string, but only certain objects are sufficiently string-like to be implicitly treated as strings. But ultimately it's all the same concept. ChrisA
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 12:06 PM, Terry Reedy wrote:
On 4/7/2016 11:03 AM, Ethan Furman wrote:
On 04/07/2016 07:07 AM, Random832 wrote:
What's __index__ for?
__index__ is a way to get an int from an int-like object without losing information; so it fails with values like 3.4, but should succeed with values like Fraction(4, 2).
__int__ is a way to convert the value to an int, so 3.4 becomes 3 (and and the 4/10's is lost).
Why is that a problem?
It isn't. I was explaining the difference between __int__ and __index__. -- ~Ethan~
![](https://secure.gravatar.com/avatar/334b870d5b26878a79b2dc4cfcc500bc.jpg?s=120&d=mm&r=g)
Random832 writes:
How about ASCII bytes strings [as a use case]? ;)
Very good point, and not at all funny. In fact, this is pretty well useless except to try to propagate the Python 2 model of str as maybe-encoded-maybe-not into Python 3 "when it's safe". But that's way less than "99%+" of use cases. Anything where you combine it with bytes has to result in bytes, or prepare for the combining operation to raise if the bytes contain any non-ASCII. Ditto str. Either way, you've not really gained much safety for the programmer, although maybe the user would prefer a crashed program to corrupt data or mojibake on screen. But they'd rather the program worked as intended, so the programmer has to do pretty much the same amount of thinking and coding around the semantic differences between Python 2 and Python 3 AFAICS. The only place I can see this StringableASCIIBytes *frequently* being really "safe" is scripting where you pass a string literal representing a path to an os function, or print a string literal. But guess what, that already works if you just use a str literal! I'm also not really sure that, given the flexibility and ubiquity of string representations, that there really are a lot of use cases whose use of __lossless_i_promise_str__ are mutually compatible. Eg, if we use this for pathlib.Path and for urllib_tng.URL, are those really compatible in the sense that we're happy handing an urllib_tng.URL to open() and have it try to operate on that? If not, we need separate dunders for this purpose. And ditto for all those hypothetical future uses (which are unlikely to be as compatible as filesystem paths and URIs, or URI paths).
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 3:51 PM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Anything where you combine it with bytes has to result in bytes, or prepare for the combining operation to raise if the bytes contain any non-ASCII.
More helpfully, the Ascii object would raise *before* that - it would raise on construction if it had any non-ASCII in it. Combining an Ascii with a bytes results in a perfectly well-formed bytes; combining an Ascii with a unicode results in a perfectly well-formed unicode. But I'm not sure how (a) useful and (b) feasible this is. The only real use-case I know of is Path objects, and there may be similar tasks around numpy or pandas, but I don't know for sure. ChrisA
![](https://secure.gravatar.com/avatar/0a2191a85455df6d2efdb22c7463c304.jpg?s=120&d=mm&r=g)
On 07.04.2016 15:04, Chris Angelico wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
I must be missing something... we already have a method for this: .__str__() -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 07 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 10:10, M.-A. Lemburg wrote:
I must be missing something... we already have a method for this: .__str__()
The point is to have a method that objects that "shouldn't" be used as strings in some contexts *won't* have, as floats (let alone strings) don't have __index__, even though they do have __int__.
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
On 2016-04-07 15:10, M.-A. Lemburg wrote:
On 07.04.2016 15:04, Chris Angelico wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
I must be missing something... we already have a method for this: .__str__()
It's for making string-like objects into strings. __str__ isn't suitable because, ints, for example, have a __str__ method, but they aren't string-like.
![](https://secure.gravatar.com/avatar/0a2191a85455df6d2efdb22c7463c304.jpg?s=120&d=mm&r=g)
On 07.04.2016 16:58, MRAB wrote:
On 2016-04-07 15:10, M.-A. Lemburg wrote:
On 07.04.2016 15:04, Chris Angelico wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
I must be missing something... we already have a method for this: .__str__()
It's for making string-like objects into strings.
__str__ isn't suitable because, ints, for example, have a __str__ method, but they aren't string-like.
Depends on what you define as "string-like", I guess :-) We have abstract base classes for such tests, but there's nothing which would define "string-like" as ABC. Before trying to define a test via a special method, I think it's better to define what exactly you mean by "string-like". Something along the lines of numbers.Number, but for strings. To make an object string-like, you then test for the ABC and then call .__str__() to get the string representation as string. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 07 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 1:26 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 07.04.2016 16:58, MRAB wrote:
On 2016-04-07 15:10, M.-A. Lemburg wrote:
On 07.04.2016 15:04, Chris Angelico wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
I must be missing something... we already have a method for this: .__str__()
It's for making string-like objects into strings.
__str__ isn't suitable because, ints, for example, have a __str__ method, but they aren't string-like.
Depends on what you define as "string-like", I guess :-)
We have abstract base classes for such tests, but there's nothing which would define "string-like" as ABC. Before trying to define a test via a special method, I think it's better to define what exactly you mean by "string-like".
Something along the lines of numbers.Number, but for strings.
To make an object string-like, you then test for the ABC and then call .__str__() to get the string representation as string.
The trouble with the ABC is that it implies a large number of methods; to declare that your object is string-like, you have to define a boatload of interactions with other types, and then decide whether str+Path yields a Path or a string. By defining __tostring__ (or whatever it gets called), you effectively say "hey, go ahead and cast me to str any time you need a str". Instead of defining all those methods, you simply define one thing, and then you can basically just treat it as a string thereafter. While you could do this by simply creating a whole lot of methods and making them all return strings, you'd still have the fundamental problem that you can't be sure that this is "safe". How do you distinguish between "object that can be treated as a string" and "object that knows how to add itself to a string"? Hence, this. ChrisA
![](https://secure.gravatar.com/avatar/0a2191a85455df6d2efdb22c7463c304.jpg?s=120&d=mm&r=g)
On 07.04.2016 17:42, Chris Angelico wrote:
On Fri, Apr 8, 2016 at 1:26 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 07.04.2016 16:58, MRAB wrote:
On 2016-04-07 15:10, M.-A. Lemburg wrote:
On 07.04.2016 15:04, Chris Angelico wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
I must be missing something... we already have a method for this: .__str__()
It's for making string-like objects into strings.
__str__ isn't suitable because, ints, for example, have a __str__ method, but they aren't string-like.
Depends on what you define as "string-like", I guess :-)
We have abstract base classes for such tests, but there's nothing which would define "string-like" as ABC. Before trying to define a test via a special method, I think it's better to define what exactly you mean by "string-like".
Something along the lines of numbers.Number, but for strings.
To make an object string-like, you then test for the ABC and then call .__str__() to get the string representation as string.
The trouble with the ABC is that it implies a large number of methods; to declare that your object is string-like, you have to define a boatload of interactions with other types, and then decide whether str+Path yields a Path or a string.
Not necessarily. In fact, a string.String ABC could have only a single method: .__str__() defined.
By defining __tostring__ (or whatever it gets called), you effectively say "hey, go ahead and cast me to str any time you need a str". Instead of defining all those methods, you simply define one thing, and then you can basically just treat it as a string thereafter.
As I said: it's all a matter of defining what a "string-like" object is supposed to mean. With the above definition of string.String, you'd have exactly what you want.
While you could do this by simply creating a whole lot of methods and making them all return strings, you'd still have the fundamental problem that you can't be sure that this is "safe". How do you distinguish between "object that can be treated as a string" and "object that knows how to add itself to a string"? Hence, this.
I suppose the .__add__() method of Path objects would implement the necessary check isinstance(other_obj, string.String) to find out whether it's safe to assume that other_obj.__str__() returns the str version of the "string-like" object other_obj. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 07 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 2:27 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Not necessarily. In fact, a string.String ABC could have only a single method: .__str__() defined.
That wouldn't be very useful; object.__str__ exists and is functional, so EVERY object would count as a string. The point of "string-like" is that it can be treated as a string, not just that it can be converted to one. This is exactly parallel to the difference between __index__ and __int__; floats can be converted to int (and will truncate), but cannot be *treated* as ints. ChrisA
![](https://secure.gravatar.com/avatar/0a2191a85455df6d2efdb22c7463c304.jpg?s=120&d=mm&r=g)
On 07.04.2016 18:46, Chris Angelico wrote:
On Fri, Apr 8, 2016 at 2:27 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Not necessarily. In fact, a string.String ABC could have only a single method: .__str__() defined.
That wouldn't be very useful; object.__str__ exists and is functional, so EVERY object would count as a string.
No, only those objects that register with the ABC would be considered string-like, not all objects implementing the .__str__() method. In regular Python, only str() objects would register with strings.String. Path objects could also register to be treated as "string-like" object. Just like numeric types are only considered part of the ABCs under numbers, if they register with these. Perhaps the name strings.String sounds confusing, so perhaps strings.StringLike or strings.FineToConvertToAStringIfNeeded would be better :-)
The point of "string-like" is that it can be treated as a string, not just that it can be converted to one. This is exactly parallel to the difference between __index__ and __int__; floats can be converted to int (and will truncate), but cannot be *treated* as ints.
Right, and that's what you can define via an ABC. Those abstract base classes are not to be confused with introspecting methods on objects - they help optimize this kind of test, but most importantly help express the combination of providing an interface by exposing methods with the semantics of how those methods are expected to be used. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts (#1, Apr 07 2016)
Python Projects, Coaching and Consulting ... http://www.egenix.com/ Python Database Interfaces ... http://products.egenix.com/ Plone/Zope Database Interfaces ... http://zope.egenix.com/
::: We implement business ideas - efficiently in both time and costs ::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ http://www.malemburg.com/
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Fri, Apr 8, 2016 at 4:42 AM, M.-A. Lemburg <mal@egenix.com> wrote:
On 07.04.2016 18:46, Chris Angelico wrote:
On Fri, Apr 8, 2016 at 2:27 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Not necessarily. In fact, a string.String ABC could have only a single method: .__str__() defined.
That wouldn't be very useful; object.__str__ exists and is functional, so EVERY object would count as a string.
No, only those objects that register with the ABC would be considered string-like, not all objects implementing the .__str__() method. In regular Python, only str() objects would register with strings.String.
Oh, gotcha. In that case, it would still be pretty much the same as the __fspath__ proposal, except that instead of creating a dunder method (to implement a protocol), you register with an ABC. ChrisA
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 11:43 M.-A. Lemburg <mal@egenix.com> wrote:
On 07.04.2016 18:46, Chris Angelico wrote:
On Fri, Apr 8, 2016 at 2:27 AM, M.-A. Lemburg <mal@egenix.com> wrote:
Not necessarily. In fact, a string.String ABC could have only a single method: .__str__() defined.
That wouldn't be very useful; object.__str__ exists and is functional, so EVERY object would count as a string.
No, only those objects that register with the ABC would be considered string-like, not all objects implementing the .__str__() method. In regular Python, only str() objects would register with strings.String.
Path objects could also register to be treated as "string-like" object.
Just like numeric types are only considered part of the ABCs under numbers, if they register with these.
Perhaps the name strings.String sounds confusing, so perhaps strings.StringLike or strings.FineToConvertToAStringIfNeeded would be better :-)
The point of "string-like" is that it can be treated as a string, not just that it can be converted to one. This is exactly parallel to the difference between __index__ and __int__; floats can be converted to int (and will truncate), but cannot be *treated* as ints.
Right, and that's what you can define via an ABC. Those abstract base classes are not to be confused with introspecting methods on objects - they help optimize this kind of test, but most importantly help express the combination of providing an interface by exposing methods with the semantics of how those methods are expected to be used.
To make MAL's proposal concrete: class StringLike(abc.ABC): @abstractmethod def __str__(self): """Return the string representation of something.""" StringLike.register(pathlib.PurePath) # Any 3rd-party library can do the same. You could also call the class StringablePath or something and get the exact same concept across where you are using the registration abilities of ABCs to semantically delineate when a class's __str__() returns a usable file path. The drawback is that this isn't easily backported like `path.__ospath__() if hasattr(path, '__ospath__') else path` for libraries that don't necessarily have access to pathlib but want to be compatible with accepting path objects.
![](https://secure.gravatar.com/avatar/5dde29b54a3f1b76b2541d0a4a9b232c.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 11:59 AM, Brett Cannon <brett@python.org> wrote:
class StringLike(abc.ABC):
@abstractmethod def __str__(self): """Return the string representation of something."""
StringLike.register(pathlib.PurePath) # Any 3rd-party library can do the same.
You could also call the class StringablePath or something and get the exact same concept across where you are using the registration abilities of ABCs to semantically delineate when a class's __str__() returns a usable file path.
The drawback is that this isn't easily backported like `path.__ospath__() if hasattr(path, '__ospath__') else path` for libraries that don't necessarily have access to pathlib but want to be compatible with accepting path objects.
and a plus is that it's compatible with type hinting -- is that the future of Python??? -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 12:03 Chris Barker <chris.barker@noaa.gov> wrote:
On Thu, Apr 7, 2016 at 11:59 AM, Brett Cannon <brett@python.org> wrote:
class StringLike(abc.ABC):
@abstractmethod def __str__(self): """Return the string representation of something."""
StringLike.register(pathlib.PurePath) # Any 3rd-party library can do the same.
You could also call the class StringablePath or something and get the exact same concept across where you are using the registration abilities of ABCs to semantically delineate when a class's __str__() returns a usable file path.
The drawback is that this isn't easily backported like `path.__ospath__() if hasattr(path, '__ospath__') else path` for libraries that don't necessarily have access to pathlib but want to be compatible with accepting path objects.
and a plus is that it's compatible with type hinting -- is that the future of Python???
So the other way to do this is to combine the proposals: class BasePath(abc.ABC): @abstractmethod def __ospath__(self): """Return the file system path, serialized as a string.""" Then pathlib.PurePath can inherit from this ABC and anyone else can as well or be registered as doing so. Then in typing.py you can have: class Path(extra=pathlib.BasePath): __slots__ = () PathLike = Union[str, Path] Then any third-party library can register with the ABC and get the typing correctly (assuming I didn't botch the type specification). Guido also had some protocol proposal a while back that I think he floated here, but I don't think the discussion really went anywhere as it was an early idea.
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 11:59 AM, Brett Cannon wrote:
To make MAL's proposal concrete:
class StringLike(abc.ABC):
@abstractmethod def __str__(self): """Return the string representation of something."""
StringLike.register(pathlib.PurePath) # Any 3rd-party library can do the same.
You could also call the class StringablePath or something and get the exact same concept across where you are using the registration abilities of ABCs to semantically delineate when a class's __str__() returns a usable file path.
I think I might like this better than a new magic method.
The drawback is that this isn't easily backported like `path.__ospath__() if hasattr(path, '__ospath__') else path` for libraries that don't necessarily have access to pathlib but want to be compatible with accepting path objects.
I don't understand. -- ~Ethan~
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 12:11 Ethan Furman <ethan@stoneleaf.us> wrote:
On 04/07/2016 11:59 AM, Brett Cannon wrote:
To make MAL's proposal concrete:
class StringLike(abc.ABC):
@abstractmethod def __str__(self): """Return the string representation of something."""
StringLike.register(pathlib.PurePath) # Any 3rd-party library can do the same.
You could also call the class StringablePath or something and get the exact same concept across where you are using the registration abilities of ABCs to semantically delineate when a class's __str__() returns a usable file path.
I think I might like this better than a new magic method.
The drawback is that this isn't easily backported like `path.__ospath__() if hasattr(path, '__ospath__') else path` for libraries that don't necessarily have access to pathlib but want to be compatible with accepting path objects.
I don't understand.
How do you make Python 3.3 code work with this when the ABC will simply not be available to you unless you're running Python 3.4.3, 3.5.2, or 3.6.0 (under the assumption that the ABC is put in pathlib and backported thanks to its provisional status)? The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
![](https://secure.gravatar.com/avatar/3dd475b8aaa5292d74cb0c3f76c3f196.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016, at 15:17, Brett Cannon wrote:
How do you make Python 3.3 code work with this when the ABC will simply not be available to you unless you're running Python 3.4.3, 3.5.2, or 3.6.0 (under the assumption that the ABC is put in pathlib and backported thanks to its provisional status)? The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
If it's not available to you, then it's not available to anyone to register either, so you obviously act as if no objects are StringLike if you get an ImportError when trying to use it. Isn't this just the standard dance for using *any* function that's new to a new version of Python? try: from pathlib import StringLike # if it's in pathlib why is it called StringLike? def is_StringLike(x): return isinstance(x, StringLike) except ImportError: def is_StringLike(x): return False
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 12:36 Random832 <random832@fastmail.com> wrote:
On Thu, Apr 7, 2016, at 15:17, Brett Cannon wrote:
How do you make Python 3.3 code work with this when the ABC will simply not be available to you unless you're running Python 3.4.3, 3.5.2, or 3.6.0 (under the assumption that the ABC is put in pathlib and backported thanks to its provisional status)? The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
If it's not available to you, then it's not available to anyone to register either, so you obviously act as if no objects are StringLike if you get an ImportError when trying to use it. Isn't this just the standard dance for using *any* function that's new to a new version of Python?
Yes, but the lack of a magic method is not as severe as a lack of an ABC you will be using in an isinstance() check.
try: from pathlib import StringLike # if it's in pathlib why is it called StringLike?
I called it StringLike because I was replying to Chris' proposal of a generic string-like protocol (which wouldn't live in pathlib).
def is_StringLike(x): return isinstance(x, StringLike) except ImportError: def is_StringLike(x): return False
That would cut out all third-party libraries no matter what their Python version support was. My point is that if I wanted this to work in Python 3.3.x, Python 3.4.2, or Python 3.5.1 then the ABC solution is out as the ABC won't exist. The magic method, though, would still work with the one-liner all the way back to Python 2.5 when conditional expressions were added to the language. For instance, let's say I'm the author of a library that uses file paths that wants to support Python 3.3 and newer. How do I add support for using pathlib.Path and Ethan's path library? With the magic method solution I can use: def ospath(path): return path.__ospath__() if hasattr(path, '__ospath__') else path If I really wanted to I could just embed that wherever I want to work with paths. Now how about the ABC? # In pathlib. class StringPath(abc.ABC): @abstractmethod def __str__(self): ... StringPath.register(pathlib.PurePath, str) # Maybe not cover str? # In my library trying to support Python 3.3 and newer, pathlib and Ethan's path library. try: from importlib import StringPath except ImportError: StringPath = None def ospath(path): if StringPath is None: if isinstance(path, StringPath): return str(path) # What am I supposed to do here? Now you could set `StringPath = object`, but that starts to negate the point of not subclassing strings as you're now accepting anything that defines __str__() in that case unless you're on a version of Python "new" enough to have pathlib w/ the ABC defined. And if you go some other route, what would you want to do if StringPath wasn't available? So the ABC vs magic method discussion comes down to whether we think third-party libraries will use whatever approach is decided upon and whether they care about how good the support is for Python 3.3.x, 3.4.2, and 3.5.1.
![](https://secure.gravatar.com/avatar/ebf132362b622423ed5baca2988911b8.jpg?s=120&d=mm&r=g)
On Apr 7, 2016, at 3:17 PM, Brett Cannon <brett@python.org> wrote:
The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
I like the idea of doing both. Make a __fspath__ method and pathlib.fspath that uses it, and then have an ABC that checks for the existence of __fspath__. ----------------- Donald Stufft PGP: 0x6E3CBCE93372DCFA // 7C6B 7C5D 5E2B 6356 A926 F04F 6E3C BCE9 3372 DCFA
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 12:17 PM, Brett Cannon wrote:
How do you make Python 3.3 code work with this when the ABC will simply not be available to you unless you're running Python 3.4.3, 3.5.2, or 3.6.0 (under the assumption that the ABC is put in pathlib and backported thanks to its provisional status)? The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
__os_path__ (or whatever it's called) also won't be available on those earlier versions -- so I'm not seeing that the ABC route as any worse. Am I missing something? -- ~Ethan~
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 20:43, Ethan Furman <ethan@stoneleaf.us> wrote:
On 04/07/2016 12:17 PM, Brett Cannon wrote:
How do you make Python 3.3 code work with this when the ABC will simply not be available to you unless you're running Python 3.4.3, 3.5.2, or 3.6.0 (under the assumption that the ABC is put in pathlib and backported thanks to its provisional status)? The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
__os_path__ (or whatever it's called) also won't be available on those earlier versions -- so I'm not seeing that the ABC route as any worse.
Am I missing something?
You can check for __os_path__ even if it doesn't exist (hasattr takes the name as a string), but you can't check for the ABC if it doesn't exist (isinstance takes the actual ABC as the argument). Paul
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 01:21 PM, Paul Moore wrote:
On 7 April 2016 at 20:43, Ethan Furman wrote:
On 04/07/2016 12:17 PM, Brett Cannon wrote:
How do you make Python 3.3 code work with this when the ABC will simply not be available to you unless you're running Python 3.4.3, 3.5.2, or 3.6.0 (under the assumption that the ABC is put in pathlib and backported thanks to its provisional status)? The ternary operator one-liner is backwards-compatible while the ABC is only forward-compatible.
__os_path__ (or whatever it's called) also won't be available on those earlier versions -- so I'm not seeing that the ABC route as any worse.
Am I missing something?
You can check for __os_path__ even if it doesn't exist (hasattr takes the name as a string), but you can't check for the ABC if it doesn't exist (isinstance takes the actual ABC as the argument).
True, but: - Python 3.3 isn't going to check for __os_path__ - Python 3.3 isn't going to check for an ABC In other words, Python 3.3* simply isn't going to work with pathlib, so I continue to be confused with why Brett brought it up. -- ~Ethan~ * In case there's any confusion: by "not work" I mean the stdlib is not going to correctly interpret a pathlib.Path in 3.3 and earlier.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 21:28, Ethan Furman <ethan@stoneleaf.us> wrote:
In case there's any confusion: by "not work" I mean the stdlib is not going to correctly interpret a pathlib.Path in 3.3 and earlier.
I think the confusion is over *who* will be checking the protocol or the ABC. You're correct that the stdlib will not do so in earlier versions. I've been assuming that is obvious (and I suspect Brett has too). Talk about using the check for older versions is basically around the possibility that 3rd party libraries might do so. I think it's unlikely that they'll bother (but I'm commenting on the suggestion because I'd like it to be easy to do if anyone does want to bother, against my expectations). I have the feeling Brett might think that it's somewhat more likely. If all we're thinking about is a way for the stdlib to work with strings and pathlib objects seamlessly, while also allowing 3rd party path classes to register to be treated the same way, then yes, there's no real difference between an ABC and a protocol (well, to register with the ABC, 3rd party code would need to conditionally import the ABC and make sure not to fail if the ABC can't be found, but that's just some boilerplate). Frankly, if it wasn't for the fact that you have stated that you'll add support for the protocol to your path library, I'd be surprised if *any* 3rd party code changed as a result of this discussion. There's been no comment from the authors of path.py or pylib (the only other 2 path objects I know of). And the only comments I've heard from authors of libraries that consume paths is "I don't see any reason why I'd bother". So as long as you're happy with the final form of the proposal, I see little reason to worry about how other 3rd party code might use it. Paul
![](https://secure.gravatar.com/avatar/e8600d16ba667cc8d7f00ddc9f254340.jpg?s=120&d=mm&r=g)
On Thu, 7 Apr 2016 at 13:46 Paul Moore <p.f.moore@gmail.com> wrote:
On 7 April 2016 at 21:28, Ethan Furman <ethan@stoneleaf.us> wrote:
In case there's any confusion: by "not work" I mean the stdlib is not going to correctly interpret a pathlib.Path in 3.3 and earlier.
I think the confusion is over *who* will be checking the protocol or the ABC. You're correct that the stdlib will not do so in earlier versions. I've been assuming that is obvious (and I suspect Brett has too).
Yep, you're right. I was never worried about making the stdlib work since that's a fully controlled environment that we can update at once. What I'm worried about is any third-party library that has an API that takes a path as an argument that may need to be updated to support pathlib -- or any other path library -- and doesn't want to directly rely on Python 3.6 for support.
Talk about using the check for older versions is basically around the possibility that 3rd party libraries might do so. I think it's unlikely that they'll bother (but I'm commenting on the suggestion because I'd like it to be easy to do if anyone does want to bother, against my expectations). I have the feeling Brett might think that it's somewhat more likely.
Yes, that's my hope as third-party libraries are the ones that have older Python version compatibility to care about (the stdlib is obviously always latest-and-greatest).
If all we're thinking about is a way for the stdlib to work with strings and pathlib objects seamlessly, while also allowing 3rd party path classes to register to be treated the same way, then yes, there's no real difference between an ABC and a protocol (well, to register with the ABC, 3rd party code would need to conditionally import the ABC and make sure not to fail if the ABC can't be found, but that's just some boilerplate).
My point is the boilerplate is minimized for third-party libraries in the instance of the magic method vs the ABC, but otherwise they accomplish the same thing.
Frankly, if it wasn't for the fact that you have stated that you'll add support for the protocol to your path library, I'd be surprised if *any* 3rd party code changed as a result of this discussion. There's been no comment from the authors of path.py or pylib (the only other 2 path objects I know of). And the only comments I've heard from authors of libraries that consume paths is "I don't see any reason why I'd bother". So as long as you're happy with the final form of the proposal, I see little reason to worry about how other 3rd party code might use it.
I'm trying to be a bit more optimistic on the uptake. :)
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Apr 7, 2016 2:32 PM, "Brett Cannon" <brett@python.org> wrote:
[...]
Frankly, if it wasn't for the fact that you have stated that you'll add support for the protocol to your path library, I'd be surprised if *any* 3rd party code changed as a result of this discussion. There's been no comment from the authors of path.py or pylib (the only other 2 path objects I know of). And the only comments I've heard from authors of libraries that consume paths is "I don't see any reason why I'd bother". So as long as you're happy with the final form of the proposal, I see little reason to worry about how other 3rd party code might use it.
I'm trying to be a bit more optimistic on the uptake. :)
If data points are useful, numpy just merged a PR for supporting pathlib paths: https://github.com/numpy/numpy/pull/6660 It's a bit awkward given the current api, but someone did care enough to take the trouble. -n
![](https://secure.gravatar.com/avatar/de311342220232e618cb27c9936ab9bf.jpg?s=120&d=mm&r=g)
On 04/07/2016 02:26 PM, Brett Cannon wrote:
Yep, you're right. I was never worried about making the stdlib work since that's a fully controlled environment that we can update at once. What I'm worried about is any third-party library that has an API that takes a path as an argument that may need to be updated to support pathlib -- or any other path library -- and doesn't want to directly rely on Python 3.6 for support.
Ah, I understand (finally!). Okay, let's go with the protocol then. If it makes sense to also have an ABC I'm fine with that. -- ~Ethan~
![](https://secure.gravatar.com/avatar/b8efb5dcb59ea1048c4763b7507d3343.jpg?s=120&d=mm&r=g)
I like this double-underscore path attribute better than my suggestion almost two weeks ago in the other thread, but which did not get any reactions: On Sun, Mar 27, 2016 at 5:40 PM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
On Sun, Mar 27, 2016 at 1:23 AM, Koos Zevenhoven <k7hoven@gmail.com> wrote:
I assume you meant to type pathlib.Path.path, so that Path("...").path == str(Path("...")). That's a good start, and I'm looking forward to Serhiy's patch for making the stdlib accept Paths. But if Path will not subclass str, we also need new stdlib functions that *return* Paths.
Actually, now that .path is not out yet, would it make sense to call it Path.str or Path.strpath instead, and introduce the same thing on DirEntry and guarantee a str (instead of str or bytes as DirEntry.path now does)? Maybe that would lead to fewer broken implementations in third-party libraries too?
But maybe it could be called `__pathname__`, since 'names' are commonly thought of as strings. This, implemented across the stdlib, would obviously be better than the status quo. Anyway, you are now discussing that remind me a lot of my thoughts last week. I hope everyone realizes that, while this would still require effort from the maintainers of every library that deals with paths, this would not allow libraries to start returning pathlib objects from functions without either breaking backwards compatibility of their APIs or adding duplicate functions. We will end up with a mess of some libraries accepting path objects and some not, and some that screw up their DirEntry compatibility regarding bytestring paths or that break their existing pure bytestring compatibility that they didn't know of. All this could take a while and give people an impression of inconsistency in Python. -Koos
![](https://secure.gravatar.com/avatar/351a10f392414345ed67a05e986dc4dd.jpg?s=120&d=mm&r=g)
We have abstract base classes for such tests, but there's nothing which would define "string-like" as ABC. Before trying to define a test via a special method, I think it's better to define what exactly you mean by "string-like".
Something along the lines of numbers.Number, but for strings.
To make an object string-like, you then test for the ABC and then call .__str__() to get the string representation as string.
Does ABC is easy to use from C? There should be easy way to (1) define "string-like" class in C and (2) use "string-like" object from C.
![](https://secure.gravatar.com/avatar/d995b462a98fea412efa79d17ba3787a.jpg?s=120&d=mm&r=g)
On 7 April 2016 at 16:26, M.-A. Lemburg <mal@egenix.com> wrote:
We have abstract base classes for such tests, but there's nothing which would define "string-like" as ABC. Before trying to define a test via a special method, I think it's better to define what exactly you mean by "string-like".
As a more general point here, if the point is simply to group a set of types together for use by "user" code (whether application code or 3rd party libraries) then that's precisely what the ABC machinery is for. It doesn't require a core change or a PEP or anything other than agreement between the users of the facility to define an ABC for *anything*, even something as vague as "string-like" - if all parties concerned agree that something is "string-like" when it is an instance of the StringLike ABC, then that's a done deal. You can add a "to_string()" method to the ABC, which can be as simple as def to_string(obj): return str(obj) and let types for which that's not appropriate override it. The only real reason for a new "protocol" (a dunder method) is if the core interpreter or low-level parts of the stdlib need to participate. In that case, the ABC mechanisms may not be available/appropriate or may introduce unacceptable overhead (e.g., in something like dict lookup) But in general, we have a mechanism for things like this, why are people inventing home-grown solutions rather than using them? (In the case of pathlib and __fspath__, it's because of the low-level implications of adding support to open() and places like importlib - but that's *also* why the solution should remain focused on the specific problem, and not become over-generalised). Paul
![](https://secure.gravatar.com/avatar/5ce43469c0402a7db8d0cf86fa49da5a.jpg?s=120&d=mm&r=g)
On 2016-04-07 14:04, Chris Angelico wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
This could supersede the __fspath__ "give me the string for this path" protocol, or could stand in parallel with it.
Obviously str will have this dunder method, returning self. Most other core types (notably 'object') will not define it. Absence of this method implies that the object cannot be treated as a string.
String methods will be defined as accepting string-like objects. For instance, "hello"+foo will succeed if foo is string-like.
Downside: Two string-like objects may behave unexpectedly - foo+bar will concatenate strings if either is an actual string, but if both are other string-like objects, depends on the implementation of those objects.
Bikeshedding:
1) What should the dunder method be named? __str_coerce__? __stringlike__?
2) Which standard types are sufficiently string-like to be used thus?
3) Should there be a bytes equivalent?
4) Should there be a format string "coerce to str"? "{}".format(x) is equivalent to str(x), but it might be nice to be able to assert that something's stringish already.
Open season!
__as_str__?
![](https://secure.gravatar.com/avatar/db5b03704c129196a4e9415e55413ce6.jpg?s=120&d=mm&r=g)
I can see this going horribly wrong when it comes to things like concatenation. I've used JS far too many times to be excited for things like this. -- Ryan [ERROR]: Your autotools build scripts are 200 lines longer than your program. Something’s wrong. http://kirbyfan64.github.io/ On Apr 7, 2016 8:05 AM, "Chris Angelico" <rosuav@gmail.com> wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
This could supersede the __fspath__ "give me the string for this path" protocol, or could stand in parallel with it.
Obviously str will have this dunder method, returning self. Most other core types (notably 'object') will not define it. Absence of this method implies that the object cannot be treated as a string.
String methods will be defined as accepting string-like objects. For instance, "hello"+foo will succeed if foo is string-like.
Downside: Two string-like objects may behave unexpectedly - foo+bar will concatenate strings if either is an actual string, but if both are other string-like objects, depends on the implementation of those objects.
Bikeshedding:
1) What should the dunder method be named? __str_coerce__? __stringlike__?
2) Which standard types are sufficiently string-like to be used thus?
3) Should there be a bytes equivalent?
4) Should there be a format string "coerce to str"? "{}".format(x) is equivalent to str(x), but it might be nice to be able to assert that something's stringish already.
Open season! _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Thu, Apr 7, 2016 at 11:04 PM, Chris Angelico <rosuav@gmail.com> wrote:
This is a spin-off from the __fspath__ discussion on python-dev, in which a few people said that a more general approach would be better.
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
This could supersede the __fspath__ "give me the string for this path" protocol, or could stand in parallel with it.
In the light of all the arguments put forward in this thread and elsewhere, I'm withdrawing this proposal in favour of __fspath__. If additional use-cases are found for a "string-like object", this can be revived, but otherwise, it's too general and insufficiently useful. ChrisA
![](https://secure.gravatar.com/avatar/bcfe2359f5f43407a4a039827f736a14.jpg?s=120&d=mm&r=g)
Chris Angelico wrote:
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
Obviously str will have this dunder method, returning self. Most other core types (notably 'object') will not define it. Absence of this method implies that the object cannot be treated as a string.
do you expect int to define it? -- By ZeD
![](https://secure.gravatar.com/avatar/d67ab5d94c2fed8ab6b727b62dc1b213.jpg?s=120&d=mm&r=g)
On Sat, Apr 9, 2016 at 3:58 PM, Vito De Tullio <vito.detullio@gmail.com> wrote:
Chris Angelico wrote:
Proposal: Objects should be allowed to declare that they are "string-like" by creating a dunder method (analogously to __index__ for integers) which implies a loss-less conversion to str.
Obviously str will have this dunder method, returning self. Most other core types (notably 'object') will not define it. Absence of this method implies that the object cannot be treated as a string.
do you expect int to define it?
Nope! An integer cannot be treated as a string. ChrisA
participants (22)
-
Alexander Walters
-
Brett Cannon
-
Chris Angelico
-
Chris Barker
-
Donald Stufft
-
Eric V. Smith
-
Ethan Furman
-
INADA Naoki
-
Koos Zevenhoven
-
M.-A. Lemburg
-
MRAB
-
Nathaniel Smith
-
Nick Coghlan
-
Nikolaus Rath
-
Oleg Broytman
-
Paul Moore
-
Random832
-
Ryan Gonzalez
-
Stephen J. Turnbull
-
Terry Reedy
-
Vito De Tullio
-
Émanuel Barry