Mailman 3 URLs/URIs + pathlib.Path + literal syntax = ? - Python-ideas

March 29, 2016

      The 'Working with Path objects: p-strings?' thread spawned a
discussion about how URLs (or more generally URIs) and Paths should
work together. I suggest we move that discussion to this new thread.
The concept is 'explained' below in this email and quotes, but a
little bit of discussion happened in the other thread too.

While I think that the decisions about p-strings (or a-strings for
addresses or whatever they should be) should keep URIs in mind, it is
premature to add the Path+URI fusion into the stdlib. I agree with
Paul Moore that this URL stuff should be on PyPI first. It could even
be library that monkey patches pathlib to accept URIs. Or a URI
library that instantiates Path objects when appropriate. Then there
could be a smooth transition into the stdlib some day.

See all the stuff below:

On Tue, Mar 29, 2016 at 11:44 AM, Sven R. Kunze <srkunze@mail.de> wrote:
...
On 27.03.2016 00:51, Koos Zevenhoven wrote:
...
OT:
To be honest, I do think it feels like URL:s are becoming (or have become) just as important as paths, and that pathlib.Path should in the future work with URLs just like it now works with windows and posix paths. The difference between "http://domain.xyz/" and "C:\\" is not huge. I also think there should be a Python type (stdlib or builtin), which handles JSON objects nicer than dicts do and has its own literal
That even occurred to me after we talked about the p-string (mainly because I am working in this field, so I basically need both file paths and URIs).
Again, that you say you thought about it too perhaps means it's worth
discussing :).
...
Just for the record: "Path" might not be the most correct wording. There is a "file://" scheme which identifies locally located files. So, paths are basically a subset of URLs speaking functionality-wise. Thus, a better/more generic name would be "URL", "URI", "Link" or the like in order to avoid confusing of later generations. However, I think I could live with Path.
Yes, these are concerns that should be considered if/when deciding
whether to make URI/URLs a subclass of Path or the other way around,
or something else. Anyway, since Path(...) already instantiates
different subclasses based on the situation, having it instantiate a
URI in some cases would not be completely unnatural.

As suggested by Stephen, I've been looking into RFC 3986 as a whole,
and it seems that making instantiating both URIs and fs paths from
p-strings does not seem completely impossible. Some points below (you
can skip past them if you have to, there's more general discussion at
the end):

- Only some URIs (or even URLs) can be reliably distinguished from
file paths. However, those that contain '://' could be automatically
turned into URI objects by p-strings [or Path(...)]. I suspect that
would cover the majority of use cases.

(The unambiguous cases would be exactly those URIs that contain an
'authority' component -- these always begin with 'scheme://' while
other's don't)

- If we want allow URIs without an 'authority' component, like
mailto:someone@domain.com', they should be explicitly instantiated as
URI objects.

- Some terminology: There are indeed 'URI's and 'relative references'.
Relative references are essentially the URI-equivalent of relative
paths. Then there are 'URI references' which can be either 'URIs' or
'relative references' (kinda like if you consider general paths that
can be absolute or relative paths, as is done in pathlib).

- Instantiating relative URI references with Path(...) or p-strings
may cause issues, because they might get turned into Windows paths and
the like. It does seem like this could be worked around by for
instance making another class like "RelativePath"  or "RelativeRef",
but there are some questions about when/how these should be
instantiated. This may lead to a need slight backwards
incompatibilities if implemented within pathlib.

- "Queries" like '?this=that' after the path component have a special
role in URIs, but in file system paths they can be parts of the file
(or even directory) name. This might again be ambiguous when using
relative paths / references. This could perhaps be dealt with by
requiring more explicit handling when joining relative paths /
references together.

- "Fragments" like '#what'. This is essentially the same issue as with
queries above and should be solved the same way. Anyway, both may be
present at the same time.

- '..' and '.' in relative paths / references. In URIs, there's a
difference between 'scheme://foo/bar/' and 'scheme://foo/bar'. Merging
the relative reference './baz' to the former gives 'scheme://foo/baz'
while merging it to the latter gives 'scheme://foo/bar/baz'. I kinda
wish the same thing was the standard with filesystem paths too.

- Percent encoding of URIs: quite obvious -- should not be done before
it is unambiguous that we deal with an URI. Perhaps it should be done
only when the resource is accessed or when the URI is exported to a
plain str or bytes etc. I suppose this is matter of what we would want
in the repr.

- I may still have missed or forgotten something.

So, also with paths, especially relative ones, a library should
"resist the temptation to guess", and carry around all the information
until the context becomes unambiguous. For instance, when merging a
relative reference with an explicit URI, the ambiguities about ?query
and #fragment and about resolving the merged path disappear.
...
Another thought: requesting URLs. Basically the same as p'/etc/hosts'.write_text(secret). It's really important to have a dead simple library which is able to work with URLs. So, if I could do:
p'https://mysite.com/{page}'.get()
Good idea. When I suggested extending Paths (and p-strings) to work
with URLs, I indeed meant that it would be an instance of (a subclass
of) Path, so that you do the same as with filesystem path objects:

    p'https://mysite.com/somepage.html'.read_text()

or

    (p'https://mysite.com' / page).read_text()

But who knows what we might end up with if we go down this path. An I
mean a metaphorical path here, not necessarily Path :). Whatever it
is, it probably can't be added to the stdlib right away. Still, we
could take some measures regarding the language and stdlib now, to
prepare for the future.

-Koos

URLs/URIs + pathlib.Path + literal syntax = ?

tags

participants (10)