The 'Working with Path objects: p-strings?' thread spawned a discussion about how URLs (or more generally URIs) and Paths should work together. I suggest we move that discussion to this new thread. The concept is 'explained' below in this email and quotes, but a little bit of discussion happened in the other thread too. While I think that the decisions about p-strings (or a-strings for addresses or whatever they should be) should keep URIs in mind, it is premature to add the Path+URI fusion into the stdlib. I agree with Paul Moore that this URL stuff should be on PyPI first. It could even be library that monkey patches pathlib to accept URIs. Or a URI library that instantiates Path objects when appropriate. Then there could be a smooth transition into the stdlib some day. See all the stuff below: On Tue, Mar 29, 2016 at 11:44 AM, Sven R. Kunze <srkunze@mail.de> wrote:
On 27.03.2016 00:51, Koos Zevenhoven wrote:
OT:
To be honest, I do think it feels like URL:s are becoming (or have become) just as important as paths, and that pathlib.Path should in the future work with URLs just like it now works with windows and posix paths. The difference between "http://domain.xyz/" and "C:\\" is not huge. I also think there should be a Python type (stdlib or builtin), which handles JSON objects nicer than dicts do and has its own literal
That even occurred to me after we talked about the p-string (mainly because I am working in this field, so I basically need both file paths and URIs).
Again, that you say you thought about it too perhaps means it's worth discussing :).
Just for the record: "Path" might not be the most correct wording. There is a "file://" scheme which identifies locally located files. So, paths are basically a subset of URLs speaking functionality-wise. Thus, a better/more generic name would be "URL", "URI", "Link" or the like in order to avoid confusing of later generations. However, I think I could live with Path.
Yes, these are concerns that should be considered if/when deciding whether to make URI/URLs a subclass of Path or the other way around, or something else. Anyway, since Path(...) already instantiates different subclasses based on the situation, having it instantiate a URI in some cases would not be completely unnatural. As suggested by Stephen, I've been looking into RFC 3986 as a whole, and it seems that making instantiating both URIs and fs paths from p-strings does not seem completely impossible. Some points below (you can skip past them if you have to, there's more general discussion at the end): - Only some URIs (or even URLs) can be reliably distinguished from file paths. However, those that contain '://' could be automatically turned into URI objects by p-strings [or Path(...)]. I suspect that would cover the majority of use cases. (The unambiguous cases would be exactly those URIs that contain an 'authority' component -- these always begin with 'scheme://' while other's don't) - If we want allow URIs without an 'authority' component, like mailto:someone@domain.com', they should be explicitly instantiated as URI objects. - Some terminology: There are indeed 'URI's and 'relative references'. Relative references are essentially the URI-equivalent of relative paths. Then there are 'URI references' which can be either 'URIs' or 'relative references' (kinda like if you consider general paths that can be absolute or relative paths, as is done in pathlib). - Instantiating relative URI references with Path(...) or p-strings may cause issues, because they might get turned into Windows paths and the like. It does seem like this could be worked around by for instance making another class like "RelativePath" or "RelativeRef", but there are some questions about when/how these should be instantiated. This may lead to a need slight backwards incompatibilities if implemented within pathlib. - "Queries" like '?this=that' after the path component have a special role in URIs, but in file system paths they can be parts of the file (or even directory) name. This might again be ambiguous when using relative paths / references. This could perhaps be dealt with by requiring more explicit handling when joining relative paths / references together. - "Fragments" like '#what'. This is essentially the same issue as with queries above and should be solved the same way. Anyway, both may be present at the same time. - '..' and '.' in relative paths / references. In URIs, there's a difference between 'scheme://foo/bar/' and 'scheme://foo/bar'. Merging the relative reference './baz' to the former gives 'scheme://foo/baz' while merging it to the latter gives 'scheme://foo/bar/baz'. I kinda wish the same thing was the standard with filesystem paths too. - Percent encoding of URIs: quite obvious -- should not be done before it is unambiguous that we deal with an URI. Perhaps it should be done only when the resource is accessed or when the URI is exported to a plain str or bytes etc. I suppose this is matter of what we would want in the repr. - I may still have missed or forgotten something. So, also with paths, especially relative ones, a library should "resist the temptation to guess", and carry around all the information until the context becomes unambiguous. For instance, when merging a relative reference with an explicit URI, the ambiguities about ?query and #fragment and about resolving the merged path disappear.
Another thought: requesting URLs. Basically the same as p'/etc/hosts'.write_text(secret). It's really important to have a dead simple library which is able to work with URLs. So, if I could do:
Good idea. When I suggested extending Paths (and p-strings) to work with URLs, I indeed meant that it would be an instance of (a subclass of) Path, so that you do the same as with filesystem path objects: p'https://mysite.com/somepage.html'.read_text() or (p'https://mysite.com' / page).read_text() But who knows what we might end up with if we go down this path. An I mean a metaphorical path here, not necessarily Path :). Whatever it is, it probably can't be added to the stdlib right away. Still, we could take some measures regarding the language and stdlib now, to prepare for the future. -Koos