That's some post. Thanks a lot for collecting all that stuff. On 29.03.2016 16:42, Koos Zevenhoven wrote:
That even occurred to me after we talked about the p-string (mainly because I am working in this field, so I basically need both file paths and URIs). Again, that you say you thought about it too perhaps means it's worth discussing :).
:)
Yes, these are concerns that should be considered if/when deciding whether to make URI/URLs a subclass of Path or the other way around, or something else. Anyway, since Path(...) already instantiates different subclasses based on the situation, having it instantiate a URI in some cases would not be completely unnatural.
You are right of course. I thought too narrow in this case.
As suggested by Stephen, I've been looking into RFC 3986 as a whole, and it seems that making instantiating both URIs and fs paths from p-strings does not seem completely impossible. Some points below (you can skip past them if you have to, there's more general discussion at the end):
Well done.
- Only some URIs (or even URLs) can be reliably distinguished from file paths. However, those that contain '://' could be automatically turned into URI objects by p-strings [or Path(...)]. I suspect that would cover the majority of use cases.
I agree. 'http' and 'https' would make the majority of schemes, when it comes to the Web. 'ftp', 'ssh' and 'mailto' might follow.
(The unambiguous cases would be exactly those URIs that contain an 'authority' component -- these always begin with 'scheme://' while other's don't)
- If we want allow URIs without an 'authority' component, like mailto:someone@domain.com', they should be explicitly instantiated as URI objects.
- Some terminology: There are indeed 'URI's and 'relative references'. Relative references are essentially the URI-equivalent of relative paths. Then there are 'URI references' which can be either 'URIs' or 'relative references' (kinda like if you consider general paths that can be absolute or relative paths, as is done in pathlib).
- Instantiating relative URI references with Path(...) or p-strings may cause issues, because they might get turned into Windows paths and the like. It does seem like this could be worked around by for instance making another class like "RelativePath" or "RelativeRef", but there are some questions about when/how these should be instantiated. This may lead to a need slight backwards incompatibilities if implemented within pathlib.
- "Queries" like '?this=that' after the path component have a special role in URIs, but in file system paths they can be parts of the file (or even directory) name. This might again be ambiguous when using relative paths / references. This could perhaps be dealt with by requiring more explicit handling when joining relative paths / references together.
- "Fragments" like '#what'. This is essentially the same issue as with queries above and should be solved the same way. Anyway, both may be present at the same time.
- '..' and '.' in relative paths / references. In URIs, there's a difference between 'scheme://foo/bar/' and 'scheme://foo/bar'. Merging the relative reference './baz' to the former gives 'scheme://foo/baz' while merging it to the latter gives 'scheme://foo/bar/baz'. I kinda wish the same thing was the standard with filesystem paths too.
All of this makes me think that it MIGHT be better to leave the decision of whether it's a *real path*, a *URL* or a *URL path* to the user. Not sure if we can handle this lazily but I CAN imagine some "confusion" *sounding like PEP 428*. If we can found an unambiguous solution, that'll be awesome and would simplify a lot.
- Percent encoding of URIs: quite obvious -- should not be done before it is unambiguous that we deal with an URI. Perhaps it should be done only when the resource is accessed or when the URI is exported to a plain str or bytes etc. I suppose this is matter of what we would want in the repr.
Good point. URIs should be able to handle both inputs. So, we would need to decide on a canonical form.
- I may still have missed or forgotten something.
So, also with paths, especially relative ones, a library should "resist the temptation to guess", and carry around all the information until the context becomes unambiguous. For instance, when merging a relative reference with an explicit URI, the ambiguities about ?query and #fragment and about resolving the merged path disappear.
Another point for "let the user decide".
Another thought: requesting URLs. Basically the same as p'/etc/hosts'.write_text(secret). It's really important to have a dead simple library which is able to work with URLs. So, if I could do:
Good idea. When I suggested extending Paths (and p-strings) to work with URLs, I indeed meant that it would be an instance of (a subclass of) Path, so that you do the same as with filesystem path objects:
p'https://mysite.com/somepage.html'.read_text()
or
(p'https://mysite.com' / page).read_text()
Ah, I see. Well, that's one approach. On the other hand, I can imagine a lot of people willing to do a "PUT", "DELETE" or "POST" (and the rather unknown other ones). It seems to me that a one-to-one mapping would be easier here instead of retrofitting. Although read_text might come in handy as an alias for "GET". :) That is when you don't care if you read locally or remotely. So, I can see room for this.
But who knows what we might end up with if we go down this path. An I mean a metaphorical path here, not necessarily Path :).
Let's see where this path lead us. ;) Best, Sven