[Python-ideas] URLs/URIs + pathlib.Path + literal syntax = ?
Sven R. Kunze
srkunze at mail.de
Tue Mar 29 11:54:51 EDT 2016
That's some post. Thanks a lot for collecting all that stuff.
On 29.03.2016 16:42, Koos Zevenhoven wrote:
>> That even occurred to me after we talked about the p-string (mainly because I am working in this field, so I basically need both file paths and URIs).
> Again, that you say you thought about it too perhaps means it's worth
> discussing :).
:)
> Yes, these are concerns that should be considered if/when deciding
> whether to make URI/URLs a subclass of Path or the other way around,
> or something else. Anyway, since Path(...) already instantiates
> different subclasses based on the situation, having it instantiate a
> URI in some cases would not be completely unnatural.
You are right of course. I thought too narrow in this case.
> As suggested by Stephen, I've been looking into RFC 3986 as a whole,
> and it seems that making instantiating both URIs and fs paths from
> p-strings does not seem completely impossible. Some points below (you
> can skip past them if you have to, there's more general discussion at
> the end):
Well done.
> - Only some URIs (or even URLs) can be reliably distinguished from
> file paths. However, those that contain '://' could be automatically
> turned into URI objects by p-strings [or Path(...)]. I suspect that
> would cover the majority of use cases.
I agree. 'http' and 'https' would make the majority of schemes, when it
comes to the Web. 'ftp', 'ssh' and 'mailto' might follow.
> (The unambiguous cases would be exactly those URIs that contain an
> 'authority' component -- these always begin with 'scheme://' while
> other's don't)
>
> - If we want allow URIs without an 'authority' component, like
> mailto:someone at domain.com', they should be explicitly instantiated as
> URI objects.
>
> - Some terminology: There are indeed 'URI's and 'relative references'.
> Relative references are essentially the URI-equivalent of relative
> paths. Then there are 'URI references' which can be either 'URIs' or
> 'relative references' (kinda like if you consider general paths that
> can be absolute or relative paths, as is done in pathlib).
>
> - Instantiating relative URI references with Path(...) or p-strings
> may cause issues, because they might get turned into Windows paths and
> the like. It does seem like this could be worked around by for
> instance making another class like "RelativePath" or "RelativeRef",
> but there are some questions about when/how these should be
> instantiated. This may lead to a need slight backwards
> incompatibilities if implemented within pathlib.
>
> - "Queries" like '?this=that' after the path component have a special
> role in URIs, but in file system paths they can be parts of the file
> (or even directory) name. This might again be ambiguous when using
> relative paths / references. This could perhaps be dealt with by
> requiring more explicit handling when joining relative paths /
> references together.
>
> - "Fragments" like '#what'. This is essentially the same issue as with
> queries above and should be solved the same way. Anyway, both may be
> present at the same time.
>
> - '..' and '.' in relative paths / references. In URIs, there's a
> difference between 'scheme://foo/bar/' and 'scheme://foo/bar'. Merging
> the relative reference './baz' to the former gives 'scheme://foo/baz'
> while merging it to the latter gives 'scheme://foo/bar/baz'. I kinda
> wish the same thing was the standard with filesystem paths too.
All of this makes me think that it MIGHT be better to leave the decision
of whether it's a *real path*, a *URL* or a *URL path* to the user.
Not sure if we can handle this lazily but I CAN imagine some "confusion"
*sounding like PEP 428*. If we can found an unambiguous solution,
that'll be awesome and would simplify a lot.
> - Percent encoding of URIs: quite obvious -- should not be done before
> it is unambiguous that we deal with an URI. Perhaps it should be done
> only when the resource is accessed or when the URI is exported to a
> plain str or bytes etc. I suppose this is matter of what we would want
> in the repr.
Good point. URIs should be able to handle both inputs. So, we would need
to decide on a canonical form.
> - I may still have missed or forgotten something.
>
> So, also with paths, especially relative ones, a library should
> "resist the temptation to guess", and carry around all the information
> until the context becomes unambiguous. For instance, when merging a
> relative reference with an explicit URI, the ambiguities about ?query
> and #fragment and about resolving the merged path disappear.
Another point for "let the user decide".
>> Another thought: requesting URLs. Basically the same as p'/etc/hosts'.write_text(secret). It's really important to have a dead simple library which is able to work with URLs. So, if I could do:
>>
>> p'https://mysite.com/{page}'.get()
>>
> Good idea. When I suggested extending Paths (and p-strings) to work
> with URLs, I indeed meant that it would be an instance of (a subclass
> of) Path, so that you do the same as with filesystem path objects:
>
> p'https://mysite.com/somepage.html'.read_text()
>
> or
>
> (p'https://mysite.com' / page).read_text()
Ah, I see. Well, that's one approach.
On the other hand, I can imagine a lot of people willing to do a "PUT",
"DELETE" or "POST" (and the rather unknown other ones). It seems to me
that a one-to-one mapping would be easier here instead of retrofitting.
Although read_text might come in handy as an alias for "GET". :)
That is when you don't care if you read locally or remotely. So, I can
see room for this.
> But who knows what we might end up with if we go down this path. An I
> mean a metaphorical path here, not necessarily Path :).
Let's see where this path lead us. ;)
Best,
Sven
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20160329/101efc4f/attachment-0001.html>
More information about the Python-ideas
mailing list