[Python-ideas] URLs/URIs + pathlib.Path + literal syntax = ?

Tue Mar 29 11:07:42 EDT 2016

Le 29/03/2016 16:42, Koos Zevenhoven a écrit :
> The 'Working with Path objects: p-strings?' thread spawned a
> discussion about how URLs (or more generally URIs) and Paths should
> work together. I suggest we move that discussion to this new thread.
> The concept is 'explained' below in this email and quotes, but a
> little bit of discussion happened in the other thread too.
> 
> While I think that the decisions about p-strings (or a-strings for
> addresses or whatever they should be) should keep URIs in mind, it is
> premature to add the Path+URI fusion into the stdlib. I agree with
> Paul Moore that this URL stuff should be on PyPI first. It could even
> be library that monkey patches pathlib to accept URIs. Or a URI
> library that instantiates Path objects when appropriate. Then there
> could be a smooth transition into the stdlib some day.
> 
> See all the stuff below:
> 
> On Tue, Mar 29, 2016 at 11:44 AM, Sven R. Kunze <srkunze at mail.de> wrote:
>>
>> On 27.03.2016 00:51, Koos Zevenhoven wrote:
>>>
>>> OT:
>>>
>>> To be honest, I do think it feels like URL:s are becoming (or have become) just as important as paths, and that pathlib.Path should in the future work with URLs just like it now works with windows and posix paths. The difference between "http://domain.xyz/" and "C:\\" is not huge. I also think there should be a Python type (stdlib or builtin), which handles JSON objects nicer than dicts do and has its own literal
>>>
>>
>> That even occurred to me after we talked about the p-string (mainly because I am working in this field, so I basically need both file paths and URIs).
>>
> 
> Again, that you say you thought about it too perhaps means it's worth
> discussing :).
> 
>>
>> Just for the record: "Path" might not be the most correct wording. There is a "file://" scheme which identifies locally located files. So, paths are basically a subset of URLs speaking functionality-wise. Thus, a better/more generic name would be "URL", "URI", "Link" or the like in order to avoid confusing of later generations. However, I think I could live with Path.
>>
> 
> Yes, these are concerns that should be considered if/when deciding
> whether to make URI/URLs a subclass of Path or the other way around,
> or something else. Anyway, since Path(...) already instantiates
> different subclasses based on the situation, having it instantiate a
> URI in some cases would not be completely unnatural.
> 
> As suggested by Stephen, I've been looking into RFC 3986 as a whole,
> and it seems that making instantiating both URIs and fs paths from
> p-strings does not seem completely impossible. Some points below (you
> can skip past them if you have to, there's more general discussion at
> the end):
> 
> - Only some URIs (or even URLs) can be reliably distinguished from
> file paths. However, those that contain '://' could be automatically
> turned into URI objects by p-strings [or Path(...)]. I suspect that
> would cover the majority of use cases.
> 
> (The unambiguous cases would be exactly those URIs that contain an
> 'authority' component -- these always begin with 'scheme://' while
> other's don't)
> 
> - If we want allow URIs without an 'authority' component, like
> mailto:someone at domain.com', they should be explicitly instantiated as
> URI objects.
> 
> - Some terminology: There are indeed 'URI's and 'relative references'.
> Relative references are essentially the URI-equivalent of relative
> paths. Then there are 'URI references' which can be either 'URIs' or
> 'relative references' (kinda like if you consider general paths that
> can be absolute or relative paths, as is done in pathlib).
> 
> - Instantiating relative URI references with Path(...) or p-strings
> may cause issues, because they might get turned into Windows paths and
> the like. It does seem like this could be worked around by for
> instance making another class like "RelativePath"  or "RelativeRef",
> but there are some questions about when/how these should be
> instantiated. This may lead to a need slight backwards
> incompatibilities if implemented within pathlib.
> 
> - "Queries" like '?this=that' after the path component have a special
> role in URIs, but in file system paths they can be parts of the file
> (or even directory) name. This might again be ambiguous when using
> relative paths / references. This could perhaps be dealt with by
> requiring more explicit handling when joining relative paths /
> references together.
> 
> - "Fragments" like '#what'. This is essentially the same issue as with
> queries above and should be solved the same way. Anyway, both may be
> present at the same time.
> 
> - '..' and '.' in relative paths / references. In URIs, there's a
> difference between 'scheme://foo/bar/' and 'scheme://foo/bar'. Merging
> the relative reference './baz' to the former gives 'scheme://foo/baz'
> while merging it to the latter gives 'scheme://foo/bar/baz'. I kinda
> wish the same thing was the standard with filesystem paths too.
> 
> - Percent encoding of URIs: quite obvious -- should not be done before
> it is unambiguous that we deal with an URI. Perhaps it should be done
> only when the resource is accessed or when the URI is exported to a
> plain str or bytes etc. I suppose this is matter of what we would want
> in the repr.
> 
> - I may still have missed or forgotten something.
> 
> So, also with paths, especially relative ones, a library should
> "resist the temptation to guess", and carry around all the information
> until the context becomes unambiguous. For instance, when merging a
> relative reference with an explicit URI, the ambiguities about ?query
> and #fragment and about resolving the merged path disappear.
> 
>> Another thought: requesting URLs. Basically the same as p'/etc/hosts'.write_text(secret). It's really important to have a dead simple library which is able to work with URLs. So, if I could do:
>>
>> p'https://mysite.com/{page}'.get()
>>
> 
> Good idea. When I suggested extending Paths (and p-strings) to work
> with URLs, I indeed meant that it would be an instance of (a subclass
> of) Path, so that you do the same as with filesystem path objects:
> 
>     p'https://mysite.com/somepage.html'.read_text()
> 
> or
> 
>     (p'https://mysite.com' / page).read_text()
> 
> But who knows what we might end up with if we go down this path. An I
> mean a metaphorical path here, not necessarily Path :). Whatever it
> is, it probably can't be added to the stdlib right away. Still, we
> could take some measures regarding the language and stdlib now, to
> prepare for the future.

Yes but then there is a scope problem: are we providing just the parsing
or also convenience method to access the ressource.

E.G, you suggested:

url('http://foo.com').get()

For a ftp url, what would you do ?

Ssh ?

Why path would have them and not Http. Why http and not ftp ? Why ftp
and not mailto: ?

And if we do implement get() for http, then urllib ? Or request ? But
then what about http 2 ? What about asyncio ?

This needs to be sorted out first.

Alhough, I do think URLS are very important, as I'm a web dev,
integrating p"http://foo.com'.get() seems dangerous. We don't know how
the web is going to move, and it's moving fast, while the stdlib is slow.

> 
> -Koos
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>